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Abstract 
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2.  Architecture 


2.1.  Memory  Organization 

The  memory  is  composed  of  32-bit  words  sad  it  is  a  uniform  address  space  starting  at  0  and  ending  at  232-l.  Each 
memory  location  is  a  byte.  Load/store  addresses  are  manipulated  as  32-bit  byte  addresses  on-chip  but  only  words  can 
be  read  from  memory  (ie.,  only  the  top  30  bits  are  sent  to  die  memory  system).  The  numbering  of  words  in  memory  is 
shown  in  Figure  2-1.  Bytes  (characters)  are  accessed  by  sequences  of  instructions  that  can  do  insertion  or  extraction  of 
characters  into  or  from  a  word.  (See  Appendix  I).  Instructions  that  affect  the  program  counter,  such  as  branches  and 
jumps,  generate  word  addresses.  This  means  that  the  offsets  used  for  calculating  load/store  addresses  are  byte  offsets, 
and  displacements  for  branches  and  jumps  are  word  displacements.  The  addressing  is  consistently  Big  Endian  [1]. 


WordO  Word  1  Word  2  Word230-l 

Figure  2-1:  Word  Numbering  in  Memory 


Bytes  are  numbered  starting  with  the  most  significant  byte  at  die  most  significant  bit  end  of  the  word.  The  bits  in  a 
word  are  numbered  0  to  31  starting  at  the  most  significant  bit  (MSB)  and  going  to  die  least  significant  bit  (LSB).  Bit 
and  byte  numbering  are  shown  in  Figure  2-2. 


0 _ I _ 8 _ _ 111$ _ 2124 _ 31_ 

:  *  :  :  :  ’  :  i . » . ■ . i 

Byte  0  (MSB  end)  Byte  1  Byte  2  Byte  3  (LSB  end) 

Figure  2-2:  Bit  and  Byte  Numbering  in  a  Word 


The  address  space  is  divided  into  system  and  user  space.  An  address  with  die  high  order  bit  (bit  0)  set  to  one  (1)  will 
access  user  space.  If  the  high  order  bit  is  zero  (0)  then  a  system  space  address  is  accessed.  Programs  executing  in  user 
spade  cannot  access  system  space.  Programs  executing  in  system  space  can  access  both  system  and  user  space. 

2.2.  General  Purpose  Registers 

There  are  32  general  purpose  registers  (GPRs)  numbered  0  through  31.  These  are  the  registers  named  in  die  register 
fields  of  die  instructions.  All  registers  are  32  bits.  Of  these  registers,  one  register  is  not  general  purpose.  Registei  0 
(rO)  contains  the  constant  0  and  thus  cannot  be  changed.  The  constant  0  is  used  very  frequently  so  it  is  the  value  that  is 


stored  in  die  constant  register.  A  constant  register  has  ooe  added  advantage.  One  register  is  needed  as  a  void 
destination  for  instructions  that  do  no  writes  a  instructions  that  are  being  nopt d  because  they  must  be  stopped  for  some 
reason.  This  is  implemented  most  easily  by  writing  to  a  constant  location. 


2.3.  Special  Registers 

There  are  several  special  registers  that  can  be  accessed  with  the  Move  Special  instructions.  They  are: 

PSW  The  processor  status  word.  This  is  described  in  more  detail  in  Section  2.4. 

PC-4,  PC-1  Locations  in  the  PC  chain  used  for  saving  and  restoring  the  state  of  the  PC  chain. 

MD  The  mul/div  register.  This  is  a  special  register  used  during  multiplication  and  division. 

2.4.  The  Processor  Status  Word 

The  Processor  Status  Word  (PSW)  holds  some  of  the  information  pertaining  to  the  current  state  of  the  machine.  The 
PSW  actually  contains  two  sets  of  bits  that  are  t-jtUnA  PSWcurrent  and  PSWother.  The  current  state  of  die  machine  is 
always  reflected  in  PSWcurrent.  When  an  exception  or  trap  occurs,  the  contents  of  PSWcwreni  are  copied  into 
PSWother.  The  e  bit  is  not  saved.  PSWother  then  contains  the  processor  state  from  before  the  exception  os  trap  so  that 
it  can  be  saved.  Interrupts  are  disabled,  PC  shifting  is  disabled,  overflows  are  masked  and  the  processor  is  put  into 
system  state.  The  /  bit  is  cleared  if  die  exception  was  an  interrupt.  A  jump  PC  and  restore  stale  instruction  (jpcrs) 
causes  PSWother  to  be  copied  into  PSWcurrent.  After  the  ALU  cycle  of  the  jpcrs  instruction,  the  interrupts  are  enabled 
and  the  processor  returns  to  user  state  with  its  state  restored.  Appendix  VI  describes  the  trap  and  interrupt  handling 
mechanisms. 

The  PSW  can  be  both  read  and  written  while  in  system  space,  but  a  write  to  the  PSW  while  in  user  space  has  no 
effect.  To  change  die  current  state  of  the  machine  via  die  PSW,  a  move  to  special  ( movtot )  instruction  must  be  used  to 
write  the  bits  in  PSWcurrent.  Before  restoring  the  state  of  die  machine,  a  move  to  special  instruction  must  be  used  to 
change  the  bits  in  PSWother.  All  the  bits  are  writable  except  the  e  bit  and  the  £-bit  shift  chain. 

The  assignment  of  tuts  is  shown  in  Figure  2-3.  The  bits  corresponding  to  PSWcurrent  are  shown  in  upper  case  and 
those  in  lower  case  correspond  to  the  bits  in  PSWother.  The  bits  are: 

I,  i  The  /  bit  should  be  checked  by  die  exception  handler.  It  is  set  to  0  when  there  is  an  interrupt 

request,  otherwise  it  will  be  set  to  a  1.  This  bit  never  needs  to  be  written  but  the  value  will  be 
retained  until  the  next  interrupt  or  exception.  The  i  bit  contains  the  previous  value  of  the  /  bit  but  in 
general  has  no  meaning  since  only  the  /  bit  needs  to  be  looked  at  when  an  exception  occurs. 

M,  m  Interrupt  mask.  When  set  to  1,  the  processor  will  not  recognize  interrupts.  Can  only  be  changed  by 

a  system  process,  an  interrupt  or  a  trip  instruction. 

U,  u  When  set  to  1,  die  processor  is  executing  in  user  state.  Can  only  be  changed  by  a  system  process, 

an  interrupt  or  a  trap  instruction. 

S,  s  Set  to  1  when  shifting  of  the  PC  chain  is  enabled. 

e  Clear  when  doing  an  exception  or  trap  return  sequence.  Used  to  determine  whether  state  should  be 

saved  if  another  exception  occurs  during  the  return  sequence.  This  bit  only  changes  after  an 
exception  has  occurred  so  the  exception  handler  must  be  used  to  inspect  this  bit  See  Appendix  VI. 

£  The  £  bits  make  up  a  shift  chain  that  b  used  to  determine  whether  the  e  bit  needs  to  be  cleared  when 

an  exception  occurs.  The  £  bits  and  the  e  bit  are  visible  to  the  programmer  but  cannot  be  written. 
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V,  v  The  overflow  mask  bit  Trips  on  overflows  are  prevented  when  this  bit  is  set.  See  Section  2.4.1. 

O.o  This  bit  gets  set  or  cleared  on  every  exception.  When  a  trap  on  overflow  occurs,  the  O  bit  is  set  to  1 

as  seen  by  the  exception  handler.  This  bit  never  needs  to  be  written.  The  o  bit  contains  the  previous 
value  of  the  O  bit  but  in  general  has  no  meaning. 

_ 2L 
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Figure  2-3:  The  Processor  Status  Word 

2.4.1.  Trap  on  Overflow 

If  the  overflow  mask  bit  in  PSWcurrent  (V)  is  cleared,  then  tire  processor  will  trap  to  location  0  (die  start  of  all 
exception  and  interrupt  handling  routines)  when  an  overflow  occurs  during  ALU  or  multiplication/division  operations. 
The  exception  handling  routine  should  begin  the  overflow  trap  handling  routine  if  the  overflow  bit  (O)  is  set  in 
PSWcurrent. 

The  V  bit  can  only  be  changed  while  in  system  space  so  a  system  call  will  have  to  be  provided  for  user  space 
programs  to  set  or  clear  this  bit. 

2.5.  Privilege  Violations 

User  programs  cannot  access  system  space.  Any  attempt  to  access  system  space  will  result  in  die  address  being 
mapped  to  user  space.  Bit  0  of  die  address  will  always  be  farced  to  1  (a  user  space  address)  in  user  mode. 

Attempting  to  write  to  the  PSW  while  in  user  space  will  be  the  same  as  executing  a  nop  instruction.  The  PSW  is  not 
changed  and  no  other  action  is  taken. 


There  are  no  illegal  instructions,  just  strange  results. 


Processor  Status  Word 


1 


Instruction  Timing 


3.  Instruction  Timing 

This  chapter  describes  the  MIPS-X  instruction  pipeline  and  the  effects  that  pipelining  has  on  the  timing  sequence  for 
various  instructions.  A  section  is  also  included  that  describes  in  detail  the  timing  of  the  various  types  of  instructions. 

3.1.  The  Instruction  Pipeline 

MIPS-X  has  a  5-stage  pipeline  with  one  instruction  in  each  stage  of  the  pipe  once  it  has  been  filled.  The  clock  is  a 
two-phase  clock  with  the  phases  called  phase  1  (4,)  and  phase  2  (+3).  The  names  of  the  pipe  stages  and  the  actions  that 
take  place  in  them  are  described  in  Table  3*1.  The  pipeline  sequence  is  shown  in  Figure  3-1. 


Abbreviation 

Name 

Action 

IF 

Instruction  Fetch 

Fetch  the  next  instruction 

RF 

Register  Fetch 

imtnictioo  is  dccwfed. 

The  register  file  is  accessed  during  the  second  half 
of  die  cycle  (Phase  2). 

ALU 

ALU  Cycle 

An  ALU  or  shift  operation  b  performed. 

Addresses  go  id  memory  at  the  end  of  the  cycle. 

MEM 

Memory  Cycle 

Waiting  for  the  memory  (external  cache)  to  back  on  read. 

Data  output  for  memory  write. 

WB 

Write  Back 

The  instruction  result  is  written  to  the  register 
file  during  the  first  half  of  the  cycle  (Phase  1). 

Table  3-1:  MIPS-X  Pipeline  Stages 


1. 

IF 

RF 

ALU 

MEM 

WB 

2. 

IF 

RF 

ALU 

MEM 

WB 

3. 

IF 

RF 

ALU 

MEM 

WB 

4. 

IF 

RF 

ALU 

MEM 

WB 

5. 

IF 

RF 

ALU 

MEM 

Figure  3*1:  Pipeline  Sequence 
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3.2.  Delays  and  Bypassing 

A  delay  ocean  become  the  result  of  a  previous  instruction  is  not  available  to  be  used  by  the  current  instruction.  An 
example  is  a  compute  instruction  that  uses  the  result  of  a  load  instruction.  If  in  Figure  3*1,  instruction  1  is  a  load 
instruction,  then  the  result  of  the  load  is  not  available  to  be  read  from  the  register  file  until  the  second  half  of  WB  in 
instruction  1.  The  first  instruction  that  can  access  the  value  just  loaded  in  the  registers  is  instruction  4  because  the 
registers  are  read  on  phase  2  of  the  cycle.  This  means  that  there  is  a  delay  of  two  instructions  from  a  load  instruction 
until  the  result  can  be  used  as  an  operand  by  the  ALU.  An  instruction  delay  can  also  be  called  a  delay  slot  where  an 
instruction  that  does  not  depend  on  the  previous  instruction  can  be  placed.  This  should  be  a  nop  if  no  useful  instruction 
can  be  found.  Delays  between  instructions  can  sometimes  be  reduced  or  eliminated  by  using  bypassing. 

Bypassing  allows  an  instruction  to  use  the  result  of  a  previous  instruction  before  it  is  written  back  to  the  register  file. 
This  means  that  some  of  the  delays  can  be  reduced.  Table  3*2  shows  the  number  of  delay  slots  that  exist  for  various 
pairs  of  instructions  in  MIPS-X.  The  table  takes  into  account  bypassing  on  both  the  results  of  a  compute  instruction  and 
a  load  instruction.  For  example,  consider  the  load-address  pair  of  instructions.  This  can  occur  if  the  result  of  the  first 
load  is  used  in  the  address  calculation  for  the  second  load  instruction.  Without  bypassing,  there  would  be  2  delay  slots. 
Table  3*2  shows  only  1  delay  slot  because  bypassing  will  take  place. 

The  possible  implementations  far  bypassing  ate  bypassing  only  to  Source  1  or  to  both  Source  1  and  Source  2.  The 
implementation  of  bypassing  in  MIPS-X  uses  bypassing  to  both  sources.  Bypassing  only  to  Source  1  means  that  the 
benefits  of  bypassing  can  only  be  achieved  if  die  second  instruction  is  accessing  the  value  from  the  previous  instruction 
via  the  Source  1  register.  If  the  second  instruction  can  only  ase  the  value  from  the  previous  instruction  as  die  Source  2 
register,  then  2  delay  slots  are  required.  Bypassing  to  both  Sources  eliminates  this  asymmetry.  The  asymmetry  is  most 
noticeable  in  the  number  of  delay  slots  between  compute  or  load  instructions  and  a  following  instruction  that  tries  to 
store  the  results  of  the  compute  or  load  instruction.  Branches  are  also  a  problem  because  the  comparison  is  done  with  a 
subtraction  of  Source  1  -  Source  2.  Not  all  branch  types  have  been  implemented  became  it  is  assumed  that  the  operands 
can  be  reversed.  This  means  that  it  will  not  always  be  possible  to  bypass  a  result  to  a  branch  instruction.  This 
asymmetry  could  be  eliminated  by  taking  one  bit  from  the  displacement  field  and  using  it  to  decide  whether  a 
subtraction  or  a  reverse  subtraction  should  be  used.  The  tradeoff  between  the  two  types  of  bypassing  is  the  ability  to 
generate  more  efficient  code  in  some  places  versus  the  hardware  needed  to  implement  mom  comparators.  Table  3-2 
shows  die  delays  incurred  for  both  implementkms  of  bypassing.  It  is  felt  that  bypassing  to  both  Sources  is  preferable 
and  the  necessary  hardware  has  been  implemented 

Instructions  in  the  slot  of  load  instructions  should  not  use  the  same  register  as  the  one  that  is  the  destination  of  the 
load  instruction.  Bypassing  will  occur  and  the  instruction  in  die  load  slot  will  get  the  address  being  used  for  the  load 
instead  of  die  value  from  the  desired  register. 

One  other  effect  of  bypassing  should  be  described  Consider  Figure  3-1.  If  instruction  1  is  a  load  to  r7  and 
instruction  2  is  a  compute  instruction  that  puts  its  result  also  in  rj,  then  there  is  an  apparent  conflict  in  instruction  3  if  it 
warns  to  use  rl  as  its  Source  1  register.  Both  the  results  from  instructions  1  and  2  will  want  to  bypass  to  instruction  3. 
This  conflict  is  resolved  by  using  the  result  of  the  second  instruction.  The  reasoning  is  that  this  is  how  sequential 
instructions  will  behave.  Therefore,  in  this  example  instruction  3  will  use  the  result  of  the  compute  instruction. 
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Instruction  Pair 

Delay  Slots  with 

Delay  Slots  with 

(Inst  1  •  Inst  2) 

Bypassing  Only 

Srcl/Src2 

Comment 

to  Source  1 

Bypassing 

Load -Compute 
Load-Address 
Load-Data 
Load -Branch 
Compute  -  Compute 
Compute  -  Address 
Compute  -  Data 
Compute  -  Branch 


Loaded  value  used  as  address 
Loaded  value  used  for  store  data 


Computed  value  used  as  address 
Compute  result  used  for  store  data 


Table  3-2:  Delay  Slots  for  MIPS-X  Instruction  Pain 


3.3.  Memory  Instruction  Interlocks 

There  are  several  instruction  interlocks  required  because  of  the  organization  of  tbe  memory  system.  The  external 
cache  is  a  write-back  cache  so  it  requires  two  memory  cycles  so  do  a  store  operation,  one  to  check  that  the  location  is  in 
the  cache  and  one  to  do  the  stare.  This  means  that  a  store  instruction  must  be  followed  by  a  non-memory  instruction  so 
that  there  can  be  two  memory  cycles  available.  For  example,  a  store  followed  by  a  compute  instruction  is  okay  because 
the  compute  instruction  does  not  use  its  MEM  cycle.  The  software  should  try  to  schedule  non-memory  instructions 
after  all  stores.  If  this  is  not  possible,  the  processor  will  stall  until  the  store  can  complete.  Scheduling  a  nop  instruction 
is  not  sufficient  because  an  instruction  cache  miss  will  also  generate  a  load  cycle.  This  cannot  be  predicted  sq  die 
hardware  must  be  able  to  stall  the  processor. 


There  are  no  restrictions  for  instructions  after  a  load  instruction.  There  is  a  restriction  that  a  load  instruction  cannot 
have  as  its  destination  the  register  being  used  to  compute  the  address  of  die  load.  The  reason  is  that  if  the  load 
instruction  misses  in  die  external  cache,  it  will  still  overwrite  its  destination  register.  This  occurs  because  a  late  miss 
detect  scheme  is  used  in  die  external  cache.  The  load  instruction  must  be  restartable. 


3.4.  Branch  Delays 

Besides  die  delays  that  can  occur  because  one  instruction  must  wait  for  the  results  of  a  previous  instruction  to  be 
stored  in  a  register  or  be  bypassed,  there  are  also  delays  because  it  takes  time  for  a  branch  instruction  to  compute  the 
destination  for  a  taken  branch.  These  are  called  branch  delays  or  branch  slots.  MIPS-X  has  two  branch  slots  after 
every  branch  instruction.  Again,  consider  Figure  3-1.  If  instruction  1  is  a  branch  instruction,  then  it  is  not  until 
instruction  4  when  die  processor  can  decide  that  the  branch  is  to  be  taken  or  not  to  be  taken. 
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Hie  tench  slots  cn  be  filled  with  two  types  of  instructions.  They  can  either  be  ones  that  are  always  executed  or 
ones  that  must  be  squashed  if  die  tench  does  not  go  in  the  predicted  direction.  Squashing  means  that  the  instructions 
are  converted  into  naps  by  preventing  their  write  hades  from  occurring.  This  is  used  if  the  branch  goes  in  a  direction 
different  from  the  one  that  was  predicted.  This  mechanism  is  described  in  more  detail  in  Section  4.3. 

3.5.  Jump  Delays 

The  computation  of  a  jump  destination  address  means  that  there  are  two  delay  slots  after  a  jump  instruction  before 
the  program  can  begin  executing  at  the  new  address.  The  computation  uses  the  ALU  to  compute  the  jump  address  so 
the  result  is  not  available  to  the  PC  until  the  end  of  the  ALU  cycle.  Unlike  branches  however,  the  instructions  in  the 
delay  slots  are  always  executed  and  never  squashed. 


3.6.  Detailed  Instruction  Timings 

This  section  describes  die  timing  of  the  instructions  as  they  flow  through  die  data  path.  It  does  not  describe  the 
controls  of  the  datapath  and  die  tuning  required  to  set  them  up.  These  timing  descriptions  are  intended  to  make  more 
dear  the  programmer’s  view  of  how  each  instruction  is  executed.  The  description  of  each  instruction  given  in  the  later 
sections  is  generally  insufficient  when  it  is  necessary  to  know  the  possible  interactions  of  various  instructions. 

The  timing  for  what  happens  during  an  exception  is  not  described  here.  Appendix  VI  discusses  die  handling  of 
exceptions. 

The  notation  that  will  be  used  to  describe  the  instruction  tunings  will  be  shown  fust  and  then  die  execution  of  a 
normal  instruction  will  be  given.  The  tuning  for  each  type  of  instruction  is  then  described  in  more  detail.  Finally,  the 
timing  for  mstep  and  dsup  are  treated  separately.  These  are  the  multiply  and  divide  step  instructions.  They  do  not  fit  in 
with  the  other  types  of  compute  instructions  because  they  use  the  MD  register. 

3.6.1.  Notation 

The  description  of  each  type  of  instruction  win  show  whet  parts  of  the  datapath  we  active  and  what  they  are  doing 
for  the  instruction  during  each  phase  of  execution.  The  notation  that  is  used  is: 
if,rf,alu.mem,wb 

These  are  the  names  of  die  pipestages  as  described  in  Table  3-1. 

IF.,  This  is  die  clock  cycle  before  the  IF  cycle  of  die  instruction  being  considered. 

4,  Phase  1  of  the  dock  cyde. 

$2  Phase  2  of  the  dock  cycle. 

rSrcl,  rSrc2  Register  values  on  die  Srcl  and  Src2  buses,  corresponding  to  the  Source  1  and  Source  2  addresses 
specified  in  the  instruction. 

rDest  Value  to  be  written  into  the  destination  register  specified  by  the  Destination  field  of  the  instruction. 

The  Srcl  bus  is  used. 

aluSrcl,  aluSrc2  ALU  latches  corresponding  to  the  values  on  the  Srcl  and  Src2  buses,  respectively. 

IR  The  instruction  register. 

MDRin  Memory  data  register  for  values  coming  onto  the  chip. 

MDRout  Memory  data  register  for  values  going  off  chip. 
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fResult  The  result  register. 

PCmvm  The  PC  source  to  be  need  for  this  instruction.  It  will  be  one  of:  the  displecetrau  adder,  the  trip 

vector,  the  increinenier,  the  ALU  or  from  the  PC  chain. 

PCinc  The  value  from  the  PC  incrementer. 

PC-4  The  lrn  value  in  the  PC  chain. 


Reg<n>,  Reg<n.  jn> 

Bit  n  or  Bits  n  to  in  of  register  Reg. 

Reg«  n  Reg  is  shifted  left  n  bits. 

Bypass  source  Either  rResult  or  MDRin 

Icache  The  onchip  instruction  cache. 

RFS  Reserved  for  Stanford. 
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3.6.2.  A  Normal  Instruction 

This  section  will  show  what  each  part  of  the  datapath  is  doing  during  each  phase  of  the  execution  of  in  instruction. 
The  description  of  specific  instruction  types  in  the  following  sections  will  only  describe  die  action  of  the  relevant  parts 
of  the  datapath  pertaining  to  the  instruction  in  question 


IF-, 

*1 

♦2 

RFS 

PCbus^PC,^ 

Precharge  tag  compvaton,  valid  bit  store 

IF 

♦l 

♦2 

Do  tag  compare 

Valid  bit  store  access 

Icache  address  decoder  *  PC<26..31> 

Detect  Icache  hit 

Recharge  Icache 

Do  Icache  access 

IR«=  Icache 

RF 

♦, 

♦2 

Do  bypass  comparisons 
tluSrcl  eerSrcl 

or  aluSicl  m  Bypass  source 
aluSic2«eiSrc2 

or  ahiSic2  —  Bypass  source 
or  aluSic2  *>  Offset  value 

Displacement  adder  latch  «■  Displacement  value 

MDRout  «=  rSrc2 

or  MDRoot  m  Bypass  source 

ALU 

♦l 

♦2 

Do  ALU,  do  displacement  adder  (for  branch  and  jump  targets) 

Recharge  Result  bus 

Result  bus  <=  ALU 
rResult  <=  Result  bus 

Memory  address  pads  «=  Result  bus  (There  may  be  a  latch  here) 

MEM 

♦l 

♦2 

RFS 

MDRin  <=  rResult 

or  MDRin  ♦=  Memory  data  pads 
or  Memory  data  pads  <=  MDRout 

t 

WB 

a 

rDest  «=  MDRin 

RFS 
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3.6.3.  Memory  Instructions 

These  instructions  do  eocenes  to  Memory  in  the  form  of  loads  and  mens.  The  coprocessor  and  floating  point 
instructions  have  exactly  the  same  tunings.  The  only  difference  is  that  the  processor  may  not  always  sauce  an  operand 
or  use  aw  opf  r— “t  during  a  coprocessor  insttnctiou. 


The  MDRout  register  is  implemented  as  a  series  of  tugiilen  to  correctly  time  the  oupnt  at  data  onto  the  memory 
datapath.  These  registers  are  labelled  MDRoetRFpy  MBKontALUp,,  MDRoelALUp^  and  MDRoutMEM4,. 


IF.,  4,  RFS 

♦j  PCbus^PCt0-w 

Prechsrge  tag  comparators,  valid  bit  store 


IF  4,  Do  tag  compare 

Valid  bit  store  access 
Icache  address  decoder  «■=  PC<26..31> 
icache  hit 
Precharge 

Do  mcreiuinter  (calculate  next  sequential  instruction  address) 
Do  Icache  access 
IR  Icache 


RF  4,  Do  bypass  comparisons 
42  aluSicl  «*rSrcl 

oralnSrcl  «■  Bypass  source 
aluSrc2  m  Offset  value 
MDRoblRF42  m  rSrd  (For  mores) 

or  MDRoulRF42  m  Bypass  source  (Far  starts) 


ALU  4,  Do  ALU(add) 

Prechsrge  Result  bus 

MDRoucALU4]  MDRontRF42  (For  stores) 

4j  Result  bus  ALU 

rResult  m  Result  but 
Memory  address  pads  «■  Result  bus 
MDRoutALUdj  •“  MDRoulALU4,  (For  stores ) 

MEM  4j  MDRoutMEM4j  m  MDRoulALU^  (For  stores) 

4j  MDRin  m  Memory  data  pads  (For  loads) 

or  Memory  data  pads  m  MDRoulMEM*,  (For  stores) 

WB  4j  rDest  e=  MDRin  (For  loads) 

42  RFS 
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3.6.4.  Branch  Instructions 

These  instructions  do  a  compere  in  (he  ALU.  The  PC  value  is  taken  from  the  displacement  adder  when  a  branch  is 
taken  and  from  the  incrementer  when  a  branch  is  not  taken 


IF,  4,  RFS 

*»  PCbose-PC,,^ 

Precharge  tag  companion,  valid  bit  store 

_  —  Do  tag  compare  " 

Valid  bit  store  access 

Icache  address  decoder  «=  PC<26..31> 

Detect  Icache  hit 
Precharge  Icache 

Do  incrementer  (calculate  next  sequential  mstroctioo  address) 

+2  Do  Icache  access 

IR«b  icache 

RF  4|  Do  bypTSi  comparisons  . 

ahiSrcl  «>  iSicl 

or  ahiSrcl  ^  Bypass  some 
ahiSrc2  «■  r&c2 

or  ahiSic2  Bypass  some 

Displacement  adder  m  Displacement  value 

ALU  4,  Do  ALU(Srcl  -  Src2),  do  displacement  adder  (for  branch  target) 
Praduage  Result  bus 

Evaluate  condition  at  the  end  of  4,  before  die  rising  edge  of  4j 
42  PC  biu  ^  Displacement  adder  (Branch  taken) 

or  PC  bos  #■  Incrementer  (Branch  not  taken) 

Tag  compare  latch  «b  PC  bus 
rResult  m  Result  bos 


l" 

t 


I 
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3.6.5.  Compute  Instructions 

These  instructions  are  mostly  3-operand  instructions  that  use  the  ALU  to  do  an  operation.  Some  of  them  do  traps  or 
jumps.  These  are  treated  separately  in  Section  3.6.6.  The  timing  for  instructions  that  access  the  special  registers  is 
described  in  Section  3.63.1. 


RFS 

PC  bus  ^  PC^^ 

Precharge  tag  comparators,  valid  bit  store 


Do  tag  compare 

Valid  bit  store  access 

Icache  address  decoder  <=  PC<26..31> 

Detect  Icache  hit 
Precharge  Icache 

Do  incrementer  (calculate  next  sequential  instruction  address) 
Do  Icache  access 
IR  c=  Icache 


Do  bypass  comparisons 
aluSrcl  «=  rSrcl 


or  aluSrcl  «=  Bypass  source 
aluSrc2  «=  rSic2 

or  ahiSrc2  e=  Bypass  source 

or  aluSrc2  «=  Immediate  value  (for  Compute  Immediate  Instructions) 


ALU  ♦, 


Do  ALU 

Precharge  Result  bus 
Result  bus  «=  ALU 
rResult  c=  Result  bus 


RFS 

MDRin  «=  rResult 


WB  6, 

♦2 


rDest«=  MDRin 
RFS 
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3.6.5.I.  Special  Instructions 

These  instructions  (movtos  and  movfrs)  access  the  special  regisurs  described  in  Section  2.3. 


IF., 

♦l 

♦2 

RFS 

PC  bus  «=  PC,^ 

Precharge  tag  comparators,  valid  bit  store 

IF 

♦, 

Do  teg  compare 

Valid  bit  store  access 

Icachc  address  decoder  «=  PC<26..31> 

Detect  Icache  hit 

Precharge  Icacbe 

Do  incrementer  (calculate  next  sequential  instruction  address) 

*2 

Do  Icache  access 

IR«b  Icache 

Rp 

♦l 

Do  bypass  comparisons 

♦2 

aluSrcl  <=  rSrcl  (For  movtos) 

or  aluSicl  «=  Bypass  source  (For  movtos) 

ALU 

♦l 

Do  ALU(pass  Srcl) 

Precharge  Result  bus 

♦a 

Result  bus  <=  alu  Srcl  (For  movtos) 

at  Result  bus  **  Special  Register  (For  movfrs) 

Special  Register  «=  Result  bus  (For  movtos) 
iResult  <=  Result  bus 

BUB— 7/i  a  ,TS 

♦1 

RFS 

♦2 

MDRin  «=  rResult 

WB 

♦1 

rDest  ms  MDRin  (For  movfrs) 

♦2 

RFS 

3.6.6.  Jump  Instructions 


ff-t 

♦i 

♦2 

.  RFS 

PC  bus  «=PC|0lire, 

Precharge  tag  comparators,  valid  bit  store 

IF 

♦l 

Do  tag  compare 

Valid  bit  store  access 

Icache  address  decoder  e=  PC<26..31> 

Detect  Icache  hit 

Precharge  Icache 

Do  incrementer  (calculate  next  sequential  instruction  address) 

♦2 

Do  Icache  access 

Res  Icache 

RF 

♦l 

Do  bypass  comparisons 

♦2 

aluSrcl  «=  iSicl 

or  aluSrcl  «=  Bypass  source 
aluSrc2  «=  Immediate  value  (For  jspei) 

ALU 

♦l 

Do  ALU(add) 

Precharge  Result  bus 

♦2 

Result  bus  «=  PCinc  (For  jspd) 

PC  bus  c=  ALU  (Far  jspei) 

or  PC  bus  «=  PC-4,  shift  PC  chain  (For  pc  and  jpers) 
or  PC  bus  *=  Trap  vector  (For  trap) 

PSWcurrent  «=  PSWother  (For  jpers) 

Result  «=  Result  bus 

♦l 

RFS 

♦2 

MDRin  e=  Result 

WB 

♦l 

iDest  e=  MDRin  (For  jspd) 

♦2 

RFS 
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3.6.7.  Multiply  Step  -  mstep 

The  MD  register  is  implemented  as  a  series  of  registers.  They  are  called  MDresult^,  MDresult.*,, 
MDmdrin.^,  and  MDwb.+,.  The  names  reflect  the  names  of  die  bypass  registers  used  when  bypassing  to  die  register 
file.  The  special  register  that  is  visible  for  reading  and  writing  is  MDresult^.  This  chain  of  registers  is  necessary  for 
restarting  the  sequence  after  an  exception.  MDwb.+j  contains  the  true  value  of  MD.  When  an  interrupt  occurs,  the 
write-back  into  this  register  is  stopped  just  like  write-backs  io  a  register  in  the  register  file.  The  value  in  this  register  is 
needed  to  restart  die  sequence.  One  cycle  after  an  interrupt  is  taken,  the  contents  of  MDwb.4,  are  available  in 
MDresult^.  This  value  has  to  be  saved  if  the  interrupt  routine  does  any  multiplication  or  division. 

The  mstart  instruction  has  similar  timing  with  a  different  ALU  operation. 


There  must  be  one  instruction  between  the  instruction  that  loads  the  MD  register  and  the  first  instruction  that  uses  the 
MD  register.  This  occurs  when  starting  a  multiplication  or  division  routine  and  when  restarting  after  an  interrupt. 


ff.l 

♦t 

RFS 

*2 

PC  bus  «= 

Precharge  tag  comparators,  valid  bit  store 

IF 

*1 

Do  tag  compare 

Valid  bit  store  access 

Icache  address  decoder  <=  PC<26..31> 

Detect  Icache  hit 

Precharge  Icache 

Do  increments  (calculate  next  sequential  instruction  address) 

Do  Icache  access 

*2 

IR  <=  Icache 

RF 

♦l 

Do  bypass  comparisons 

♦2 

aluSrcl  <=  rSrcl«  1 

or  aluSrcl  «=  Bypass  source«  1 

aluSrc2  «=  rSrc2 

g 

ALU 

♦l 

Do  ALU(add) 

Latch  aluSrcl 

Precharge  Result  bus 

♦2 

Result  bus  «=  ALU  (MSB  (MDresult.*,)  is  1) 

or  Result  bus  «=  aluSrcl  (MSB  (MDresulL^,)  is  0) 

(Result  «=  Result  bus 

MDresult+2  4=1  MDresulL^j«  1 

♦l 

MDresulcdj  «=  MDresult^j 

♦2 

MDRin  «e  iResult 

MDmdrin.^2  ^  MDresult^, 

WB 

♦l 

iDestec  MDRin 

MDwb.+)  «■=  MDmdrin.^2 

*2 

RFS 
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3.6.8.  Divide  Step  -  dstep 

The  MD  register  is  also  used  for  this  instruction.  See  Section  3.6.7  for  a  description  of  its  implementation  and  the 
notation  used. 

IF-l 

*1 

RFS 

♦2 

PC  bus  «=  PCjdun, 

precharge  tag  comparators,  valid  bit  store 

IF 

♦l 

Do  tag  compare 

Valid  bit  store  access 

Icache  address  decoder  e=  PC<26..31> 

Detea  Icache  hit 

Precharge  Icache 

Do  inaememer  (calculate  next  sequential  instruction  address) 

+2 

Do  Icache  access 

IR«=  Icache 

RF 

♦l 

Do  bypass  comparisons 

♦2 

aluSrcl  c=  rSrcl«  1  ♦  MSB(MDresult^t) 

or  aluSrcl  <=  Bypass  soutce«  1  +  MSBCMDiesult^j) 
aluSrc2  e=  rSrc2 

ALU 

♦l 

Do  ALU(sub) 

Precharge  Result  bus 

♦2 

Result  bus  c=  ALU  (MSB  (ALU  result)  is  0) 

or  Result  bus  «=  aluSrcl  (MSB  (ALU  result)  is  1) 
iResult  «=  Result  bus 

MDresult42  «=  MDresulL4>1«  1  +  Complement  of  MSB(ALU  result) 

MEM 

♦l 

MDresult-tj  «=  MDresult^ 

♦2 

MDRin  «=  rResult 

MDmdrin.62  <=  MDresult-6] 

WB 

♦l 

rDest  <=  MDRin 

MDwb.d]  <=  MDmdrin.^2 

♦2 

RFS 

i 


Instruction  Timing 


Instruction  Timing 


21 


4.  Instruction  Set 

Then  are  four  different  types  of  htawuctkms.  They  are  memory  instructions.  branch  instructions,  compute 
instructions,  and  wenpm*  immediate  instructions.  hmry.|inii«  mtc  part  of  the  instructiops 

4.1.  Notation 

This  section  explains  the  notation  esed  in  the  descriptions  of  the  interactions. 

MSB(x)  The  most  significant  bit  of  x 

x«y  x  is  shifted  left  by  y  bits. 

x»y  x  is  shifted  right  by  y  bits. 

x*y  x  is  a  number  represented  in  base  y 

x  ||  y  x  is  concatenated  with  y. 

PCcuirent  Address  of  the  instruction  being  fetched  during  the  ALU  cycle  of  an  instruction 

PCnext  Address  of  die  next  instruction  to  be  fetched. 

Reg(n)  The  contents  of  CPU  register  n. 

FReg(n)  The  contents  of  register  a  in  the  floating  point  unit  (FPU). 

Reg<n>,  Reg<n.jn> 

Bit  nor  Bits  n  tom  of  register  Reg. 

Memory! addr]  The  contents  of  memory  at  the  location  addr.  The  value  accessed  is  always  a  word  of  32  bits. 

SignExtentKn)  The  value  ofn  sign  extended  to  32  bits.  The  size  of  n  is  specified  by  the  field  being  sign  extended. 

rSrcl  The  register  number  used  as  the  Source  1  operand. 

rSrc2  The  register  number  used  as  the  Source  2  operand. 

rDest  The  register  number  used  as  the  Destination  location. 

fSrcl  The  register  number  used  as  the  Source  1  floating  point  operand. 

fSrc2  The  register  number  used  as  the  Source  2  floating  point  operand. 

fDest  The  register  number  used  as  the  Destination  floating  point  register. 

CopI  Coprocessor  instruction. 

MAR  The  memory  address  register.  The  contents  of  this  register  ate  placed  on  the  address  pins  of  the 

processor. 

MDR  The  memory  data  register.  The  address  pads  of  the  processor  always  reflect  the  contents  of  this 

register. 

4.2.  Memory  Instructions 

The  memory  instructions  are  the  ones  that  do  an  external  memory  cycle.  The  mast  commonly  used  memory 
instructions  are  load  and  store.  The  other  instructions  that  are  part  of  the  memory  instructions  are  the  coprocessor 
instructions.  They  do  not  always  generate  a  memory  cycle  that  is  recognised  by  memory.  Instead  the  coprocessor  uses 
the  cycle.  This  is  explained  in  mote  detail  in  the  individual  instruction  descriptions. 
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4.2.1.  Id  -  Load 


ty  op  ski _ Del _ QSsiOZl 

IIOIOOQ)  *  l  *  *  I . ’  *  — 


Assembler 

Id  Oflset[rSrc  1  ],iDest 

Operation 

Reg(Dest)  «=  Memory  [SignExtend(Offset)  +  Reg(Srcl)] 

Description 

The  offset  field  is  sign  extended  end  added  to  the  contents  of  the  register  specified  by  die  Sicl  field  to  compute  a 
memory  address.  The  contents  of  that  memory  location  is  put  into  Reg(Dest). 

Note:  An  instruction  in  the  slot  of  a  load  mstructioc  that  uses  the  same  register  as  the  load  instruction  is  loading  is 
not  guaranteed  to  get  the  correct  result.  Do  not  try  to  use  die  load  slots  in  this  manner. 


i 


Id 


Load 


Id 
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4.2.2.  st  -  Store 


_22 _ QE _ Si si _ S&2 _ Offcta7) 

ii  oio  i  oi  •  ‘  ‘  i  *  ’  ‘  •  i . 


Assembler 

st  Offset[rSrc  1  ]  jSrc2 

Operation 

Memory[SignExtend(Offset)  +  Reg(Srcl)]  «=  Reg(Sic2) 

Description 

The  offset  field  is  sign  extended  and  added  to  the  contents  of  the  register  specified  by  the  Sicl  field  to  compute  a 
memory  address.  The  contents  of  Reg(Siic2)  are  stored  at  that  memory  location. 

This  instruction  requires  2  memory  cycles,  one  to  read  the  cache  and  then  one  to  do  die  store.  To  obtain  maximum 
performance,  instructions  that  do  not  require  a  memory  cycle  should  be  scheduled  after  a  store  instruction  if  possible. 
Otherwise,  the  processor  may  stall  for  one  cycle. 


st 


Store 


st 


^  •  • 
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4.2.3.  Idf  -  Load  Floating  Point 


TY  OP _ Srcl _ Dest _ Offsetfl' 


Assembler 

ldf  Offset[rSrcl]4Dest 

Operation 

FReg(Dest)  <=  Memory  [SighExtend(Offset)  +  Reg(Srcl)] 

Description 

The  ofFset  field  is  sign  extended  and  added  to  die  contents  of  the  register  specified  by  die  Srcl  field  to  compute  a 
memory  address.  The  contents  of  that  memory  location  is  put  into  the  register  specified  by  Dest  in  the  floating  point 
unit  (FReg(Dest)).  The  CPU  ignores  die  data  returned  in  die  memory  cycle. 

Note:  An  instruction  in  die  slot  of  a  load  instruction  that  uses  the  same  register  as  the  load  instruction  is  loading  is 
not  guaranteed  to  get  the  correct  result.  Do  not  try  to  use  die  load  slots  in  this  manner. 

Note:  If  a  processor  configuration  does  not  have  an  FPU  then  different  code  must  be  generated  to  emulate  the 
floating  point  instructions.  Any  code  that  tries  to  use  FPU  instructions  when  there  is  no  FPU  will  not  execute  correctly. 


Idf 


Load  Floating  Point 


Idf 
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4.2.4.  stf  -  Store  Floating  Point 


TY  OP  Srcl _ _ QSSSmi 

II  oil  1  . . .  I  . . 


Assembler 

stf  Offset[r5rcl],fSrc2 

Operation 

Memory[SignExtend(Offset)  +  Reg(Srcl)]  «=  FReg(Src2) 

Description 

The  offset  field  is  sign  extended  and  added  to  the  contents  of  the  register  specified  by  the  Srcl  field  to  compute  a 
memory  address.  The  contents  of  the  floating  point  register  specified  by  Src2  are  stored  at  that  memory  location.  The 
CPU  does  not  put  out  any  data  during  this  write  memory  cycle. 

Note:  If  a  processor  configuration  does  not  have  an  FPU  then  different  code  must  be  generated  to  emulate  the 
floating  point  instructions.  Any  code  that  tries  to  use  FPU  instructions  when  there  is  no  FPU  will  not  execute  correctly. 


stf 


Store  Floating  Point 


stf 
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4.25.  Idt  -  Load  Through 


TV  OP  Srcl _ J2ssl _ Q<ftetfl7) _ 

II  oio  0  . . . .  I 

Assembler 

kb  OffsetfrSiclJ'iDest 

Operation 

Reg(Dest)  m  Memory[SignExtend(Offset)  +  Reg(Srcl)] 

Description 

This  instruction  is  the  sane  as  Id  except  that  b  is  guaranteed  to  bypass  the  cache.  There  is  no  check  to  see  whether 
the  location  being  accessed  currently  exists  in  the  cache. 

The  offset  field  is  sign  extended  and  added  to  the  contents  of  the  register  specified  by  the  Srcl  field  to  compote  a 
memory  address.  The  contents  of  that  memory  location  is  pot  into  Reg(Dest). 

Note:  An  instruction  in  die  slot  of  a  load  instruction  that  uses  die  same  register  as  die  load  instruction  is  loading  is 
not  guaranteed  to  get  the  correct  result  Do  not  tty  to  use  die  food  slots  in  this  manner. 


s 


Mt 


Load  Through 


Idt 


4.2.6.  stt  -  Store  Through 


TY  OP _ Srcl 

1 1  0  10  1  1  I  1  *  ' 


JSisL 


Assembler 

itt  Offset[rSrcl]jSrc2 

Operation 

Memory[SignExtend(Offset)  +  Reg(Sicl)]  «=  Reg(Src2) 

Description 

This  instroctian  is  the  same  as  st  except  that  it  is  guaranteed  to  bypass  the  cache.  There  is  no  check  tc  see  whether 
the  location  being  accessed  currently  exists  in  the  cache. 

The  offset  field  is  sign  extended  and  added  to  the  contents  of  the  register  tpecified  by  the  Srcl  field  to  compute  a 
memory  address.  The  contents  ofReg(Src2)  are  stored  at  that  memory  location. 


stt 


Store  Through 


stt 
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4.2.7.  movfrc  -  Move  From  Coprocessor 


TY  OP  SrciriO) _ Pest  COP# _ _ CS1  CS2/CD 

ii  ni.  q-.i  ipllp^..fi-JLL..‘ i.’-'.  l.  *  i *.  .Lj.-  *.  jlj  ..1  : 

1  Copl 


Assembler 

movfrc  CopljDest 

Operation 

MAR  <=  SignExtend(CopI)  +  Reg(Srcl) 

RegCDest)  «=  MDR 

Description 

This  instruction  is  used  to  do  a  Coprocessor  register  to  CPU  register  move. 

The  Copl  field  is  sign  extended  and  added  to  the  contents  of  the  register  specified  by  the  Sicl  field.  The  Srcl  field 
should  be  Register  0  if  tire  Copl  field  is  to  be  unmodified  (hackers  take  note).  The  Copl  field  will  appear  on  the  address 
lines  of  the  processor  where  it  can  be  read  by  the  coprocessor.  The  coprocessor  will  place  a  value  on  the  dau  bus  that 
will  be  stored  in  Reg(Dest)  of  tire  CPU.  The  memory  system  will  ignore  this  memory  cycle. 

The  Copl  field  is  decoded  by  the  coprocessors  to  find  die  coprocessor  being  addressed  (COP#)  and  die  function  to  be 
performed.  A  possible  format  is  show-'  above.  The  fields  CS1  and  CS2/CD  show  possible  coprocessor  register  fields. 
The  format  is  flexible  except  that  all  coprocessors  should  find  the  COP#  in  the  same  place. 

a 

Note:  An  instruction  in  the  slot  of  a  movfrc  instruction  that  uses  the  same  register  that  the  movfrc  instruction  is 
loading  is  not  guaranteed  to  get  the  correct  result  Do  not  try  to  use  die  slots  in  this  manner. 


movfrc 


Move  From  Coprocessor 


movfrc 
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4.2.8.  movtoc  -  Move  To  Coprocessor 


JTX _ QP _ SffiM) _ Src2  _COP»  Func _ CSl  CS2/CD 

II  011  1  11000001  '  ‘  '  *  I  *  *  I . 

I  CopI 


Assembler 

movtoc  CopljSrc2 

Operation 

MAR  «=  SignExtend(CopI)  +  Reg(Sicl) 

MDR  <=  Reg(Src2) 

Description 

This  instruction  is  used  to  do  a  CPU  register  to  Coprocessor  register  move. 

The  CopI  field  is  sign  extended  and  added  to  the  contents  of  the  register  specified  by  the  Srcl  field.  The  Srcl  field 
should  be  Register  0  if  die  CopI  field  is  to  be  unmodified  (hackers  take  note).  The  CopI  field  will  appear  on  the  address 
lines  of  the  processor  where  it  can  be  read  by  die  coprocessor.  The  contents  cf  register  Src2  are  placed  on  the  data  lines 
so  that  die  coprocessor  can  access  die  value.  The  memory  system  will  ignore  this  memory  cycle. 

The  CopI  field  is  decoded  by  the  coprocessors  to  find  the  coprocessor  being  addressed  (COP#)  and  die  function  to  be 
performed.  A  possible  format  is  shown  above.  The  fields  CSl  and  CS2/CD  show  possible  coprocessor  register  fields. 
The  format  is  flexible  except  dud  all  coprocessors  should  find  the  COP#  in  the  same  place. 


movtoc 


Move  To  Coprocessor 


movtoc 


J 


*■'  -fta 
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4.2.9.  aluc  -  Coprocessor  ALU 


TY  OP  Srcl(rO) _ COP# _ Fun£ _ CS1  CS2/CD 

II  011  0  1  10  0  0  0  010  0  0  0  01  »  »  I . I  ,  .  .  I  .  .  . 

I  CopI 


Assembler 

aluc  CopI 

Operation 

MAR  «=  SignExtend(CopI)  +  Reg(Srcl) 

Description 

This  instruction  is  used  to  execute  a  coprocessor  instruction  that  does  not  require  the  transfer  of  data  to  or  from  the 


CPU. 


This  instruction  is  actually  implemented  as: 
movfrc  CopUO  . 

The  CopI  field  is  sign  extended  and  added  to  the  contents  of  the  register  specified  by  die  Srcl  field.  The  Srcl  field 
should  be  Register  0  if  the  CopI  field  is  to  be  unmodified  (hackers  take  note).  The  CopI  field  will  appear  on  the  address 
lines  of  the  processor  where  it  can  be  read  by  the  coprocessor.  The  memory  system  will  ignore  this  memory  cycle. 

The  CopI  field  is  decoded  by  the  coprocessors  to  find  die  coprocessor  being  addressed  (COP#)  and  the  function  to  be 
performed.  A  possible  format  is  shown  above.  The  fields  CS1  and  CS2ICD  show  possible  coprocessor  register  fields. 
The  format  is  flexible  except  that  all  coprocessors  should  find  the  COP#  in  the  same  place. 

Note  that  this  instruction  is  needed  to  perform  floating  point  ALU  operations.  Only  floating  point  loads  and  stores 
have  special  FPU  instructions. 


aluc 


Coprocessor  ALU 


aluc 
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4.3.  Branch  Instructions 

As  described  previously  in  Section  3.4,  all  branch  instructions  have  two  delay  slots.  The  instructions  placed  in  the 

slots  can  be  either  ones  that  must  always  execute  or  ones  that  should  be  executed  if  the  branch  is  taken.  There  are  two 

flavours  of  branch  instructions  that  must  be  used  depending  on  the  type  of  instructions  placed  in  the  slots.  They  are: 

No  squash:  The  instructions  in  the  slots  ate  always  executed.  They  are  never  squashed  (turned  into  nops). 

Squash  if  don't  go:  All  branches  are  statically  predicted  to  go  (be  taken).  This  means  that  the  instructions  in  the 

branch  slots  should  be  instructions  from  the  target  instruction  stream.  If  the  branch  is  not 
taken,  then  the  instructions  in  the  slots  are  squashed. 

The  instructions  in  the  slots  must  be  both  of  the  same  type.  That  is,  they  should  both  always  execute  or  both  be  from 
the  target  instruction  stream.  If  squashing  takes  place,  both  instructions  in  die  slots  are  treated  equally. 

Note  that  for  best  performance,  it  is  best  to  try  to  find  instructions  that  can  always  execute  and  use  the  no  squash 
branch  types. 

Branch  instructions  can  be  put  in  the  slot  of  branches  that  can  be  squashed. 

The  branch  conditions  are  established  by  testing  the  result  of 
Reg(Src  1 )  -  Reg(Src2) 

where  Srcl  and  Src2  are  specified  in  die  branch  instruction.  The  condition  to  be  tested  is  specified  in  die  COND  field 
of  the  branch  instruction.  The  expressions  used  to  derive  the  conditions  use  the  following  notation: 

N  Bit  Oof  the  result  is  a  1.  The  result  is  negative. 

Z  The  result  is  0. 

V  32-bit  2’s-complement  overflow  has  occurred  in  the  result. 

C  A  carry  bit  was  generated  from  bit  Oof  the  result  in  the  ALU. 

©  Exclusive-Or 

Some  branch  conditions  that  are  usually  found  on  other  machines  do  not  exist  on  MIPS-X.  They  can  be  synthesized 
by  reversing  the  order  of  die  operands  or  comparing  with  Reg(0)  in  Source  2  (Src2-0).  These  branches  are  shown  in 
Table  4-1  along  with  the  existing  branches. 


Bruch 

Description 

Expression 

Branch  To  Use 

If  Synthesized 

beq 

Bruch  if  equal 

Z 

bge 

Branch  if  greater  than  or  equal 

N©  V 

bgt 

Branch  if  greater  than 

(N©  V)  +  Z 

bit  (rev  ops) 

bhi 

Brandi  if  higher 

c+z 

bk>  (rev  ops) 

bhs 

Branch  if  higher  or  same 

c 

bk 

Branch  if  less  than  or  equal 

(N  ©  V)  +  Z 

bge  (rev  ops) 

bio 

Branch  if  lower  than 

c 

bios 

Brandi  if  lower  or  same 

C  +  Z 

bhs  (rev  ops) 

bU 

Branch  if  less  dun 

N  ©  V 

bne 

Branch  if  not  equal 

z 

bpl 

Branch  if  plus 

N 

bge  (cmp  to  Src2-0) 

bmi 

Branch  if  minus 

N 

bit  (cmp  to  Src2-0) 

bn 

Branch  always 

beqrOjO 

Table  4-1:  Branch  Instructions 


— t 


4.3.1.  beq  -  Branch  If  Equal 


s  ■  1  =*  Squash  if  don't  go 
s  -  0  =>  No  squashing 


Assembler 

beq  rSrcl,rSrc2J-abel  ;  No  squashing 

beqsq  rSrcl,rSrciLabel  ;  Squash  if  don’t  go 

Operation 

If  [Reg(Stcl)  -  Reg(Sic2)]  =»  Z 
then 

PCnext  e=  PCcurrent  +  SignExtend(Disp) 

Description 

If  Reg(Srcl)  equals  Reg(Src2)  then  execution  continues  at  Label  and  the  two  delay  slot  instructions  are  executed. 
The  value  of  Label  is  computed  by  adding  PCcurrent  +  the  signed  displacement. 

If  Reg(Srcl)  does  not  equal  Reg(Sic2),  then  the  delay  slot  instructions  are  executed  for  beq  and  squashed  for  beqsq. 


4.3.2.  bge  -  Branch  If  Greater  than  or  Equal 


TY  Cond _ §rcl _ Src2  SO _ Dispfl6) _ 

I00I111I  ,,»»|  ’  Isl . I 

s  -  1  =>  Squash  if  don’t  go 
s  -  0  =>  No  squashing 

Assembler 

bge  rSrcl,rSrc24-abel  ;  No  squashing 

bgesq  rSrcljSrc2Xabel  ;  Squash  if  don't  go 

Operation 

If  [Reg(Sicl>  -  Reg(Src2)]  =>  N  ©  V 
then 

POnext  «=  PCcunent  +  SignExtend(Disp) 

Description 

This  is  a  signed  compare. 

If  Reg(Srcl)  is  greater  than  or  equal  to  Reg(Src2)  then  execution  continues  at  Label  and  the  two  delay  slot 
instructions  are  executed.  The  value  of  Label  is  computed  by  adding  PCcunent  +  the  signed  displacement 

If  Reg(Srcl)  is  less  than  Reg(Src2),  then  the  delay  slot  instructions  are  executed  for  bge  and  squashed  for  bgesq. 


bge 


Branch  If  Greater  Than  Or  Equal 


bge 
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4.3.3.  bhs  -  Branch  If  Higher  Or  Same 


s»l  =»  Squash  if  don't  go 
s  -  0  =*  No  squashing 


Assembler 

bhs  iSicl^Sn^Label  ;  No  squashing 

bhssq  rSrcl,rSre2,Label  ;  Squash  if  don’t  go 

Operation 

If  [Reg(Srcl)  -  Reg(Src2)]  =>  C 
then 

PCnext  <=  PCcurrent  +  SignExtend(Disp) 

Description 

This  is  an  unsigned  compare. 

If  Reg(Sicl)  is  higher  than  or  equal  to  Reg(Src2)  then  execution  continues  at  Label  and  the  two  delay  slot 
instructions  are  executed.  The  value  of  Label  is  computed  by  adding  Recurrent  ♦  the  signed  displacement 

If  Reg(Srcl)  is  lower  than  Reg(Src2),  then  the  delay  slot  instructions  are  executed  for  bhs  and  squashed  for  bhssq. 


t 

.1 

1 

* 


Branch  If  Higher  Or  Sane 
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4.3.4.  blo  -  Branch  If  Lower  Than 


10  011  1  0 . 

•  at: 

•  Hi’’’ 

s  -  1  Squash  if  don’t  go 

s«0  s>  No  squashing 

Assembler 

blo  rSrc  1  ,rSrc.2.Labcl  ;  No  squashing 

blosq  rSrcljSrc2jjibel  ;  Squash  if  don’t  go 

Operation 

If  [Reg(Srcl)  -  Reg(Src2)]  =>  C 
then 

PCnext  «=  PCcunent  +  SignExtend(Disp) 

Description 

This  is  an  unsigned  compare. 

If  Reg(Srcl)  is  lower  than  Reg(Stc2)  then  execution  continues  at  Label  and  the  two  delay  slot  instructions  are 
executed.  The  value  of  Label  is  computed  by  adding  PCcunent  +  the  signed  displacement 

If  Reg(Stc  1)  is  higher  than  or  equal  to  Reg(Sic2)  or  if  there  was  a  carry  generated,  then  the  delay  slot  instructions  are 
executed  far  blo  and  squashed  for  blosq. 


I 


blo 


Branch  If  Lower  Than 


blo 
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4.3.5.  bit  -  Branch  If  Less  Than 


JH _ Cond  Srel _ Src2  SQ _ 

io  oio  i  1 1  :  *  :  :  i  •  •  1  •  u > . 

*  ■  1  =*  Squash  if  don’t  go 
s  -  0  =>  No  squashing 


Assembler 

bit  iSicl,rSic2JLabel  ;  No  squashing 

bltsq  iSicl,rSre2,Label  ;  Squash  if  don’t  go 

Operation 

If  [Reg(Srcl)  -  Reg(Src2)]  =»N©  V 
then 

PCnext  <=  PCcurrent  +  SignExtend(Disp) 

Description 

This  is  a  signed  compare. 

If  Reg(Srcl)  is  less  than  Reg(Sic2)  then  execution  continues  at  Label  and  the  two  delay  slot  instructions  are 
executed.  The  value  of  Label  is  computed  by  adding  PCcurrent  +  the  signed  displacement. 

If  Reg(Srcl)  is  greater  than  or  equal  to  Reg(Src2),  then  the  delay  slot  instructions  a re  executed  for  bit  and  squashed 
for  bltsq. 


bit 


Brandi  If  Less  Than 


4.3.6.  bne  -  Branch  If  Not  Equal 


JDL _ £2Qd _ _ Src2  SO  D»pn6) _ 

10  oil  0  H  . . I 

s-l  =»  Squash  if  don’t  go 
s  -  0  =»  No  squashing 

Assembler 

bne  iSrcl,rSnc2>Label  ;  No  squashing 

bnesq  rSrcl,rSrc2, Label  ;  Squash  if  don’t  go 

Operation 

If  [Reg(Sicl)  -  Reg(Src2)]  =>  Z 
then 

PCnext  «r  PCcurrent  +  SignExtend(Disp) 

Description 

If  Reg(Srcl)  does  not  equal  Reg(Src2)  then  execution  continues  at  Label  and  the  two  delay  slot  instructions  are 
executed.  The  value  of  Label  is  computed  by  adding  PCcurrent  +  die  signed  displacement 

If  Reg(Srcl)  equals  Reg(Src2),  then  the  delay  slot  instructions  are  executed  for  bne  and  squashed  fin-  bnesq. 


i 


bne 


Branch  If  Not  Equal 


bne 


4.4.  Compute  Instructions 

Most  of  the  compute  instructions  are  3-openmd  instructions  that  use  the  ALU  or  the  shifter  to  perform  an  operation 
on  the  contents  of  2  registers  and  store  the  result  in  a  third  register. 


Assembler 

add  rSrcljSrc2jDcst 

Operation 

Reg(Dest)  «=  Reg(Srcl)  +  Reg(Src2) 

Description 

Hie  sum  of  the  contents  of  the  two  source  registers  is  stored  in  the  destination  register. 
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4.4.2.  dstep  -  Divide  Step 


-32 - S2E - SsJ _ £a2 _ Bsi _ Comp  Funcfl2) _ 

10  IIOOOI  1  '  '  ‘  1  .  '  10  0  0  1  0  1  1  0  0  1  1  01 


Assembler 

dstep  rSrcl,rSrc2jDest 

Operation 

Sicl  should  be  the  same  as  Dest 

ALUsrcl  «=Reg(Srcl)«  1  +  MSB(Reg(MD)) 

ALUsrc2  <=  Reg(Src2) 

ALUoutput  <=  ALUsrcl  -  ALUsrc2 

If  MSB(  ALUoutput)  is  1 
then 

Reg(Dest)  «=  ALUsrcl 
Reg(MD)  <=  Reg(MD)«  1 
else 

Reg(Dest)  <=  ALUoutput 
Reg(MD)  «=  Reg(MD)«  1  +  1 

Description 

This  is  one  step  of  a  1 -bit  restoring  division  algorithm.  The  division  scheme  is  described  in  Appendix  IV. 


dstep 


Divide  Step 
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4.4.3.  mstart  -  Multiply  Startup 


TY  OP _ See] _ Se2 _ Del _ CompFuncn2) _ 

10  no  o  0  10  oooo . .  :  ■  ■  10000111001101 


Assembler 

mstart  rSrc2jDest 

Operation 

If  MSB(Multiplier  loaded  in  Reg(MD))  is  1 
then 

Reg(Dest)  «=0- Reg(Src2) 

Reg(MD)  <=  Reg(MD)«  1 
else 

Reg(Dest)  «=  0 
Reg(MD)  <=  Reg(MD)«  1 

Description 

This  is  the  first  step  of  a  1-bit  shift  and  add  multiplication  algorithm  used  when  doing  signed  multiplication.  If  the 
most  significant  bit  of  the  multiplier  is  1.  then  the  multiplicand  is  subtracted  from  0  and  the  result  is  stored  in 
Reg(Dest).  The  multiplication  scheme  is  described  in  Appendix  IV. 


mstart 


Multiply  Startup 


mstart 
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4.4.4.  mstep  -  Multiply  Step 


TY  OP _ SiEj _ Ss2 _ Pest  _ Como  Func(12) _ 

10  110  0  01  -  I  •  ■  '  ’  I  -  10  0  0  0  1  0  0  1  1  0  0  11 


Assembler 

mstep  rSicl,rSn:2jDest 

Operation 

Sicl  should  be  the  same  as  Best 

If  MSB(Reg(MD»  is  1 
then 

Reg(Dest)  <=  Reg(Srcl)«  1  +  Reg(Src2) 

Reg(MD)  «=  Reg(MD>c<  1 
else 

Reg(Dest)  <=  Reg(Srcl)«  1 
Reg(MD)  <=  Reg(MD)«  1 

Description 

This  is  one  step  of  a  1-bit  shift  and  add  multiplication  algorithm.  The  multiplication  scheme  is  described  in 
Appendix  IV. 


mstep 


Multiply  Step 


mstep 
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4.4.5.  sub  -  Subtract 


TY-  OP _ Sl£l _ 5s2 _ Bsl _ CompFunca2) _ 

10  111  0  01  -  I  1  1  ■■  I  '  I  1  *  10  0  0  0  0  1  1  0  0  1  1  01 


J 

Assembler  1 

sub  rSicl,rSic2j'Dest 

Operation 

Reg(Dest)  «=  Reg(Srcl)  -  Reg(Src2) 

Description 

The  Source  2  register  is  subtracted  from  tlv*  Source  1  register  and  the  difference  is  stored  in  the  Destination  register. 


sub 


V 

I*f' ■*'  J.  t  J-.  '■•.•a' V >-  V- V -  ^ j'j-'j 


mb 


Subtract 


»•  ^  »■ 
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4.4.6.  subnc  -  Subtract  with  No  Carry  In 


TY  OP _ Srcl _ §rc2 _ Pest _ CotnpFuncq2) _ 

1 0  1  1 1  0  0  I  *  1  *  *  I  *  *  '  '  I  '  *  *  *  10000001001  101 


Assembler 

subnc  rSrcl,rSic2^Dest 

Operation 

Reg(Dest)  «=  Reg(Src  1 )  +  Reg(Src2) 

Description 

The  l’s  complement  of  the  Source  2  register  is  added  to  the  Source  1  register  and  the  result  is  stored  in  the 
Destination  register.  This  instruction  is  used  when  doing  multiprecision  subtrs  lion. 

The  following  is  an  example  of  double  precision  subtraction.  The  operation  required  is  C  -  A  -  B,  where  A,  B  and 
C  are  double  wad  values. 


aubnc 

rAhl, rBhl, rCbl 

laubtract  high  words 

bhssq 

rAlo.rBlo,  11 

/chack  If  subtract  of  low 
; worda  ganarataa  a  carry 
/branch  if  carry  aat 

addl 

nop 

rChi.Sl.rChi 

/ add  1  to  high  word  if  carry 

11: 

aub 

rAlo.rBlo. Clo 

/subtract  low  words 

I 


subnc 


Subtract  with  No  Carry  In 


subnc 


1 
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4.4.7.  and  -  Logical  And 


- 2E - Se! - Ss2 - ESI _ Comp  Func(  121 _ 

1°  *1*  P  91-1  ,,,l  •  •  •  •  I  ,,,,I0  0000010001  II 


Assembler 

and  rSrcljSrc2^Dest 

Operation 

Reg(Dest)  <=  Reg(Srcl)  bitwise  and  Rcg(Src2) 

Description 

This  is  a  bitwise  logical  and  of  the  bits  in  Source  1  and  Source  2.  The  result  is  placed  in  TWHmHnn 


and  Logical  And  and 
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4.4.8.  bic  -  Bit  Clear 


TY  OP 


Srcl 


Src2 


Pest 


Comp  Fync(12) 


io  in  o  oi 


-LQ-Q— Q-  o  ooooioii 


Assembler 

bic  rSrcl,rSrc24'Dcst 

Operation 


Reg(Dest)  <=  Reg(Srcl)  bitwise  and  Reg(Src2) 

Description 

Each  bit  that  is  set  in  Source  1  is  cleared  in  Source  2.  The  result  is  placed  in  Destination. 


bic 


Bit  Clear 


bic 


a 


4 


i 


4.4.10.  or  -  Logical  Or 


JH. _ QE _ Se! _ §e£ _ £>Si _ CompPnncn2) _ 

io  m  o  . . .  *  *  *  *  i-**  *  '  1000000  1  1  1  0  1  II 


Assembler 

or  rSrcl,rSrc2jDest 

Operation 

Reg(Dest)  «=  Reg(Srcl)  bitwise  or  Reg(Src2) 

Description 

This  is  a  bitwise  logical  or  of  the  bits  in  Source  1  tnd  Source  2.  The  result  is  placed  in  Destination. 


or 


Logical  Or 
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4.4.11.  xor  -  Exclusive  Or 


Assembler 

xor  rSrcljSrc2/Dest 

Operation 

Reg(Dest)  «=  Reg(Srcl)  bitwise  exclusive-or  Reg(Src2) 

Description 

This  is  a  bitwise  exclusive-or  of  die  bits  in  Source  1  and  Source  2.  The  result  is  placed  in  Destination. 
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4.4.12.  mov  -  Move  Register  to  Register 


TY  OP _ Srcl _ Pest _ Comp  Funcd2) _ 

10  1 II  0  01  :  ’  ’  •  IQ  0  0  0  01  ’  •  •  ’  10  0  0  0  0  0  0  1  1  0  0  II 


Assembler 

mov  rSrcl^Dest 

Operation 

Reg(Dest)  <=  Reg(Srcl) 

Description 

This  is  a  register  to  register  move.  It  is  implemented  as 
add  rSicljO^Dest . 

This  mnemonic  is  provided  for  convenience  and  clarity. 


mov 


Move  Register  to  Register 


mov 
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4.4.13.  asr  -  Arithmetic  Shift  Right 


TY  OP _ Srcl  Pest _ Comp  Funcf  121 _ 

10  1  10  0  . . 10  0  0  0  01  *  *  *  lOOOlOlbbbddddl 


Assembler 

isr  rSrcljDest,#shift  amount 

Operation 

Reg(Dest)  «=  Reg(Srcl)»  shift  amount  (See  below  for  explanation  of  shift  amount ) 

The  high  order  bits  are  sign  extended. 

Description 

The  contents  of  Source  1  are  arithmetically  shifted  right  by  shift  amount.  The  sign  of  the  result  is  the  same  as  the 
sign  of  Source  1.  The  result  is  stored  in  Destination.  The  range  of  shifts  is  from  1  to  32. 

To  determine  the  encoding  for  the  shift  amount,  first  subtract  the  shift  amount  from  32.  The  result  can  be  encoded  as 
Shits.  Assume  the  5-bit  encoding  is  bbbtf,  where  bbb  is  used  in  the  final  encoding.  The  bottom  two  bits  (ef)  are  fully 
decoded  to  yield  dddd  in  the  following  way: 


ef 

00 

0001 

01 

0010 

10 

0100 

11 

1000 

For  example,  to  determine  the  bits  required  to  specify  the  shift  amount  for  the  shift  instruction 
asri4/3,#S 

first  do  (32-5)  to  get  27  and  then  encode  27  according  to  the  above  to  get  1101000. 


asr 


Arithmetic  Shift  Right 


asr 
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4.4.14.  rotlb  -  Rotate  Left  by  Bytes 


TY  OP _ §T£l _ Ss2 _ Efii _ Comp  Func(121 _ 

10  no  . . I  •  •  •  •  \  •  10  000110000001 


Assembler 

rotlb  rSrcl,rSic2jDe$t 

Operation 

Reg(Dest)  «=  Reg(Sicl)  rotated  left  by  Reg(Src2)<30..31>  bytes 

Description 

This  instruction  rotates  left  the  contents  of  Source  1  by  the  number  of  bytes  specified  in  bit  30  and  bit  31  of  Source  2. 
For  example, 

Reg(Srcl)  -  AB01CD23#16 
Reg(Src2)- 51*16 

rotlb  rSrcl,rSrc2j'Dest 

Reg(Dest)  -  01CD23AB#16 


rotlb 


Rotate  Left  by  Bytes 


rotlb 
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4.4.15.  rotlcb  -  Rotate  Left  Complemented  by  Bytes 


Assembler 

rotlcb  rSicl,iSic2jDest 

Operation 

Reg(Dest)  <=  Reg(Srcl)  rotated  left  by  BitComplement[Reg(Src2)<30..31>]  bytes 

Description 

This  instruction  routes  left  the  contents  of  Source  1  by  the  number  of  bytes  specified  by  using  tbe  bit  complement  of 

bits  30  and  31  in  Source  2.  Far  example, 

Reg(Sicl)  -  AB01CD23416 
Reg(Src2)- 51*16 

rotlcb  i5icl,iSrc2jDest 

Rotate  amount  is  BitComplement  of  01*2  ■  10*2  -  2. 

Reg(Dest)  -  CD23AB01#16 


rotlcb 


Rotate  Left  Complemented  by  Bytes 


rotlcb 
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4.4.16.  sh  -  Shift 


TY  OP  Srcl _ Src2 _ Pest  Comp  Func(12) _ 

<0  1  10  0  1  I . .  .  .  .  . . 00100  lb  bbddddl 


Assembler 

sh  rSrcl  ,rSrc2jDest,#shifi  amount 

Operation 

Reg(Dest)  «=  Bottom  shift  amount  bits  of  Reg(Src2)  ||  Top  ‘H-shift  amount  bits  of  Reg(Sicl) 

Description 

The  shifter  is  a  funnel  shifter  that  concatenates  Source  2  as  the  high  order  word  with  Source  1  and  the  shift  amount  is 
used  to  select  a  32-bit  field  as  die  result  The  range  of  shift  amount  is  from  1  to  32. 

The  encoding  of  the  shift  amount  is  explained  in  the  description  of  the  air  instruction.  For  example,  die  instruction 
sh  i4,r2^5t#7 

places  in  i5  the  bottom  7  bits  of  x2  (in  the  high  order  position)  concatenated  with  the  top  25  bits  of  f4.  The  bits  to 
specify  die  shift  amount  are  determined  by  first  doing  (32-7)  to  get  25.  Then  encode  25  to  get  1100010. 

The  following  able  gives  some  more  examples: 

Assume 

Reg (Srcl)  -  89ABCDEF#16 
Reg(Src2)  -  12345670416 


.  Amount 

bbbdddd 

Result 

0 

Not 

Valid 

1 

1111000 

44D5E6F7 

4 

1110001 

089ABCDE 

16 

1000001 

567089AB 

28 

0010001 

23456708 

31 

0000010 

2468ACE1 

32 

0000001 

12345670 

Assembler 

nop 

Operation 

Reg(0)  <=  Reg(O)  +  Reg(0) 

Description 

This  instruction  does  do  not  much  except  take  time  and  space.  It  is  implemented  as 
addiOjOjO 


nop 


No  Operation 
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4.5.  Compute  Immediate  Instructions 

Hie  compute  immediate  instructions  have  one  source  and  one  destination  register.  They  provide  a  means  to  load  a 
17-bit  constant  that  is  stored  as  part  of  die  instruction.  Some  of  the  instructions  are  used  to  access  die  special  registers 
described  in  Section  2.3.  In  general,  instructions  that  do  not  fit  in  with  any  of  the  other  groups  are  placed  here. 


i 

I 


'  1 


4.5.1.  addi  -  Add  Immediate 


Immed(17> 


Assembler 

addi  Src  1  ,#lmmed,Dest 

Operation 

Reg(Dest)  <=  SignExtend(lmmed)  +  Reg(Srcl) 

Description 

The  value  of  the  signed  immediate  constant  is  added  to  Source  1  and  the  result  is  stored  in  Destination. 


Add  Immediate 


Assembler 

jpc 

Operation 

PCnext  «=  PC-4 

Description 

The  PC  chain  should  have  been  loaded  with  the  3  return  addresses.  PCnext  is  loaded  with  the  contents  of  PC-4 
which  should  contain  a  return  address  used  for  returning  from  an  exception  to  user  space. 

This  instruction  should  be  the  second  and  third  of  3  jumps  using  the  addresses  in  the  PC  chain.  The  first  jump  in  the 
sequence  should  be  jpcrs  which  also  causes  some  Mate  bits  to  change. 


4.5.3.  jpcrs  -  Jump  PC  and  Restore  State 


i 

! 

1 

4 

4 

H 

TY  OP _  Comp  Func(12) _  ] 

II  ill  1  no  oooo  10  OOP  o  io  _q_q_q_q_lq_o  _q_q  .  o_o  0  0-0  0  1  II  j 

I 

4 

4 

Assembler  j 

jpcrs  1 

Operation  ! 

PC  shifting  enabled 
PSWcurrcnt  «=  PSWocher 

PCnext  «=  PC-4  j 

Description 

The  PC  chain  should  have  been  loaded  with  the  3  return  addresses.  PCnext  is  loaded  with  the  contents  of  PC-4  I 

which  should  contain  the  first  return  address  when  returning  from  an  exception  to  user  space. 

t 

This  instruction  should  be  the  first  of  3  jumps  using  the  addresses  in  the  PC  chain.  The  next  two  instructions  should 
be  jpcs  to  jump  to  the  2  other  instructions  needed  to  restart  the  machine. 

The  machine  changes  from  system  to  user  state  at  the  end  of  the  ALU  cycle  of  Ac  jpcrs  instruction.  The  PSW  is 
changed  at  this  time  as  well. 

When  this  instruction  is  executed  in  user  state,  the  PSW  is  not  changed.  The  effective  result  is  a  jump  using  the 
contents  of  PC-4  as  die  destination  address. 


jpcrs 


Jump  PC  and  Restore  State 


jpcrs 


4.5.4.  jspci  -  Jump  Indexed  and  Store  PC 


Immedfl7 


Assembler 

jspci  rSrcl,#Immed,rDest 

Operation 

PC  «=  Reg(Srcl)  +  SignExtend(Immed) 

RegfDest)  «=  PCcurrent  +  1 

Description 

This  instruction  has  two  delay  slots.  The  address  of  the  instruction  after  the  two  delay  slots  is  stored  in  the 
Destination  register.  This  is  the  return  location.  The  immediate  value  is  sign  extended  and  added  to  the  contents  of 
Source  1.  This  is  die  jump  destination  so  it  is  jammed  into  the  PC.  The  displacement  is  a  17-bit  signed  word 
displacement. 

This  instruction  provides  a  fast  linking  mechanism  to  subroutines  that  are  called  via  a  trap  vector. 


Jump  Indexed  and  Store  PC 


Wivi,7iv»'7i' 
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4.5.5.  movfrs  -  Move  from  Special  Register 


_H _ QE _ £&t _ CompFuncfl2) _ 

II  HO  1  IIP  0  Q  0  0  1  ■  ■  ;  1  10  0  0  0  010  Q  0  0  0  0  0  0  01  :  1  I 


Spec 


Assembler 

movfrs  SpecialRegjDest 

Operation 

Reg(Dest)  <=  Reg(Spec) 

Description 

This  instruction  is  used  to  copy  die  special  registers  described  in  Section  2.3  into  a  general  register.  The  contents  of 
the  special  register  are  put  in  die  destination  register.  The  value  used  in  die  Spec  field  for  each  of  the  special  registers  is 
shown  in  the  table  below  along  with  die  assembler  mnemonic. 


SpedalReg 

Spec 

psw 

001 

md 

010 

pcm4 

100 

The  PSW  (psw)  can  be  read  in  both  system  and  user  state. 

A  move  from  pcm4  causes  the  PC  chain  to  shift  after  the  move. 


movfirs 


Move  from  Special  Register 


movfrs 


>l\Vv.vvv.v 
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4 .5.6.  movtos  -  Move  to  Special  Register 


Spec 


Assembler 

movtos  rSrcl^pecialReg 

Operation 

Reg(Spec)  <=  Reg(Srcl) 

Description 

This  instruction  is  used  to  load  die  special  registers  described  in  Section  13.  The  contents  of  the  Source  1  register  is 
put  in  the  special  register.  The  value  used  in  the  Spec  field  for  each  of  die  special  registers  is  shown  in  the  table  below 
along  \  the  assembler  mnemonic. 
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4.5.7.  trap  -  Trap  Unconditionally 


TY  OP _ Vector^ _ 

LI  JJJL  1  -0-10  0  0  0  0  10  o  0  -0  0  10  o  o  o  o  0-1  -■  ’  -  ’  -•  1  10  1  II 


Assembler 
trap  Vector 
Operation 

Stop  PC  Shifting 
PC  Vector  «  3 
PSWocher  «*  PSWcunent 

Description 

The  shifting  of  the  PC  chain  is  stopped  and  the  PC  is  loaded  with  the  contents  of  the  Vector  field  shifted  left  by  3 
bits.  The  PSW  of  the  user  space  is  saved. 

This  is  an  unconditional  trap.  The  instruction  is  used  to  go  to  a  system  space  routine  from  user  space.  The  state  of 
the  machine  changes  from  user  to  system  after  the  ALU  cycle  of  the  trap  instruction. 

The  trap  instruction  cannot  be  placed  in  the  fust  delay  slot  of  a  branch,  jspei,  jpe,  or  jpers  instruction.  See  Appendix 
VI  for  more  details. 

i 

The  assembler  should  convert  Vector  to  its  one’s  complement  farm  before  generating  the  machine  instruction,  ie., 
the  machine  instruction  contains  die  one’s  complement  of  the  vector. 


trap 


Trap  Unconditional)? 


trap 


.•W* 
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4.5.8.  hsc  -  Halt  and  Spontanaoualy  Combust 


TY  OP _ _ 

It  110  01)1  1  1  1  no  oooo  10  oooo  op  o_o  o  o  ooooooi 


Assembler 

hsc 

Operation 

Rcg(31)  <=  PC 

The  processor  stops  fetching  instructions  md  self  destructs. 

Note  that  the  contents  ofReg(31)  are  actually  lost 

Description 

This  is  executed  by  the  processor  when  a  protection  violation  is  detected.  It  is  a  privileged  instruction  available  only 
on  the  -NSA  versions  of  the  processor. 


hsc 


Halt  and  Spontaneously  Combust 


hsc 


Appendix  I 

Some  Programming  Issues 

This  appendix  contains  some  programming  issues  that  must  be  stated  but  have  not  been  included  elsewhere  in  this 
document 

1.  Address  0  in  both  system  and  user  space  should  have  a  nop  instruction.  When  an  exception  occurs  during 
a  squashed  branch,  the  PCs  for  die  instructions  that  have  been  squashed  are  set  to  0  so  that  when  these 
instructions  are  restarted  they  will  not  affect  any  state.  The  nop  at  address  0  is  also  convenient  for  some 
sequences  when  it  is  necessary  to  load  a  null  instruction  into  the  PC  chain. 

2.  The  instruction  cache  contains  valid  bits  for  each  of  the  32  buffers.  There  is  also  a  bit  to  indicate  whether 
the  buffer  contains  system  or  user  space  instructions.  When  it  is  necessary  to  invalidate  the  instruction 
cache  entries  for  a  context  switch  between  user  processes,  a  system  space  routine  is  executed  that  jumps  to 
32  strategic  locations  to  force  all  of  the  system  bits  to  be  set  in  the  tags.  Thus  when  die  new  user  process 
begins,  the  cache  is  flushed  of  the  previous  user  process.  An  example  code  sequence  is  shown  at  the  end 
of  this  appendix. 

3.  After  an  interrupt  occurs,  no  registers  should  be  accessed  for  two  instructions  so  that  die  tags  in  the  bypass 
registers  can  be  flushed.  If  a  register  access  is  done,  then  it  is  possible  that  the  instruction  will  get  values 
out  of  the  bypass  registers  written  by  the  previous  context  instead  of  the  register  file.  This  should  not  be  a 
problem  because  the  PCs  must  be  saved  first  anyways.  Since  this  happens  in  system  space,  the  interrupt 
hmHitr  can  just  be  written  so  that  the  improper  bypassing  does  not  occur. 

4.  There  is  no  instruction  that  can  be  used  to  implement  synchronisation  primitives  such  as  test-and-set.  The 
proposed  method  is  so  use  Dekker’s  algorithm  or  some  other  software  scheme  [3]  but  if  this  proves  to  be 
insufficient  then  a  load-locked  instruction  can  be  implemented  as  a  coprocessor  instruction  for  the  cache 
controller.  This  instruction  will  lock  the  bus  until  another  coprocessor  instruction  is  used  to  unlock  it 
This  can  be  used  to  implement  a  read-modify-write  cycle. 

3.  A  long  constant  can  be  loaded  with  the  following  sequence: 

.data 

laballt 

.word  0xABCD1234 
.text 

Id  laball [rO] , rS 

rS  now  containa  ABCD1234I16 

6.  If  a  privileged  instruction  is  executed  in  user  space  none  of  die  state  bits  can  be  changed.  This  meant  that 
writing  die  PSW  becomes  a  nop.  Reading  the  PSW  returns  the  correct  value.  Trying  to  execute  a  jpert  t 
only  does  a  jump  to  the  address  in  PC-4  and  does  not  change  the  PSW.  There  is  no  trap  taken  for  a 
privilege  violation. 

7.  Characters  can  be  inserted  and  extracted  with  the  following  sequences: 

For  oach  of  thoaa  axamplas,  assume 
r2  Initially  contains  stuv 
r3  initially  contains  wxyz 
where  s,  t,  u,  v,  w,  x,  y  and  z  are  byte  values. 

S 

I  Byte  insertion  -  byte  u  gets  replaced  by  w 

; 

addi  r0,l2,rl 

rotlb  r2,rl,r2  ;  r2  < —  uvst 

ah  r3,r2,r2,*24  ;  r2  < —  vstw 

rotlcb  r2,rl,r2  ;  r2  < —  stwv 

; 

;  Extract  byte  -  extract  byte  u  from  r2  and  place  it  in  r3 

; 

addi  r0,t2,rl 

rotlb  r2,rl,r3  ;  r3  < —  uvst 

sh  r3,r0,r3,424  ;  r3  <--  u 
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Appendix  II 
Opcode  Map 

This  is  a  summary  of  how  the  bits  in  the  instruction  opcodes  have  been  assigned.  The  first  sections  will  show  how 
die  bits  in  the  OP  and  Comp  Func  fields  are  assigned.  Then  the  opcode  map  of  the  complete  instruction  set  will  be 
given. 


11.1.  OP  Field  Bit  Assignments 

The  OP  bits  are  bits  2*4  in  all  instructions.  For  memory  type  instructions  the  bits  have  no  particular  meaning  by 
themselves.  Far  branch  type  instructions  the  bits  in  the  OP  field  (also  known  as  the  Cond  field)  are  assigned  as  follows: 

Bit  2  Set  to  0  if  branch  on  condition  true,  set  to  1  if  branch  on  condition  false 

Bits  3-4  Condition  upon  which  the  branch  decision  is  made.  00  -  unused,  01  -  Z,  10  -  C,  11  -  N  ©  V 

For  compute  type  instructions  the  bits  are  assigned  as  follows: 

Bit  2  Set  to  1  if  the  ALU  always  drives  the  result  bus  for  the  instruction 

Bit  3  SettoO 

Bit  4  Set  to  1  if  the  shifter  always  drives  the  result  bus  for  the  instruction 

For  compute  immediate  type  instructions  the  bits  are  assigned  as  follows: 

Bit  2  Set  to  1  if  the  ALU  always  drives  the  result  bus  for  the  instruction 

Bits  3-4  These  bits  have  no  particular  meaning  by  themselves 


11.2.  Comp  Func  Field  Bit  Assignments 

The  Comp  Func  bits  are  bits  20  through  31  in  the  compute  and  compute  immediate  type  instructions.  The  bits  are 
assigned  according  to  whether  they  are  being  used  by  the  ALU  or  the  shifter.  The  bits  for  the  ALU  are  assigned  in  the 
following  way: 


Bits  20-22  Unused 

Bit23  Set  to  1  for  dstep,  0  otherwise 

Bit  24  Set  to  1  for  multiply  instructions  (mstart,  mstep),  0  otherwise 

Bit  25  Cany  in  to  die  ALU 

Bits  26-29  Input  to  the  F  function  block. 


Bit  26  Srcl*Src2 

Bit  27  SrclSrc2 

Bit  28  Srcl  ■  Src2 

Bit  29  Srcl  *Src2 


Bits  30-31  Input  to  the  G  function  block. 

Bit  30  0  for  ALU  add  operation,  1  otherwise 

Bit  3 1  0  for  ALU  subtract  operation',  1  otherwise 

The  bits  for  the  shifter  are  assigned  as  follows: 


Bits  20-21  Unused 

Bit  22  Set  to  1  for  funnel  shift  operation  (sh  instruction) 

Bit  23  Set  to  1  for  arithmetic  shift  operation  (asr  instruction) 

Bit  24  Set  to  1  for  byte  rotate  instructions  (rotlb,  rotlcb) 
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Bit  25  For  byte  route  instructions,  set  to  1  if  rotlb,  0  if  rodcb 

Bits  25-31  Shift  amount  for  funnel  and  arithmetic  shift  operations  (sh  and  tsr  instructions).  The  rsnge  is  0  to 

31  bits.  Although  this  can  be  encoded  in  five  bits,  the  two  low-order  bits  are  fully  decoded; 
therefore,  the  field  is  seven  bits.  The  two  low-order  bits  are  decoded  as  follows:  0  -  bit  31, 1  -  bit 
30,  2  -  bit  29,  3  ■  bit  28.  For  example,  a  shift  amount  of  30  would  become  1110100  in  this 
seven-hit  encoding  scheme. 


t 

i 
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11.3.  Opcode  Map  of  All  Instructions 

Memory  Instructions 


Instruction 

TY 

OP 

Comments 

Id 

10 

000 

* 

St 

10 

010 

ldf 

10 

100 

* 

stf 

10 

110 

ldt 

10 

001 

* 

stt 

10 

Oil 

iDOvfrc 

10 

101 

Srcl-0,  * 

movtoc 

10 

111 

Srcl-0 

aluc 

10 

101 

Srcl-0,  Dest-0 

Branch  Instructions 


Instruction 

TY 

COND 

beq 

00 

001 

bge 

00 

111 

bhs 

00 

010 

bio 

00 

110 

bit 

00 

011 

bne 

00 

101 

Compute  Instructions 

Instruction 

TY 

OP 

Comp  Func 

Comments 

add 

01 

100 

000000011001 

dstep 

01 

000 

000101100110 

ms  tart 

01 

000 

000011100110 

Srcl-0 

instep 

01 

000 

000010011001 

sub 

01 

100 

000001100110 

subnc 

01 

100 

000000100110 

and 

01 

100 

000000100011 

bic 

01 

100 

000000001011 

Src2-0 

not 

01 

100 

000000001111 

Src2-0 

or 

01 

100 

000000111011 

xor 

01 

100 

000000011011 

mov 

01 

100 

000000011001 

Src2-0 

asr 

01 

001 

OOOlObbbdddd 

Src2-0,  bbbdddd-rotate  amount 

rot  lb 

01 

001 

000011000000 

rotlcb 

01 

001 

000010000000 

sh 

01 

001 

OOlOObbbdddd 

bbbdddd-rotate  amount 

nop 

01 

100 

000000011001 

Srcl-0,  Src2-0,  Dest-0 

Compute  Immediate  Instructions 

Instruction 

TY 

OP 

Con$>  Func 

Comments 

addi 

11 

100 

Immed 

*  (Immed  is  a  17-bit 

jspci 

11 

000 

Imroed 

*  signed  constant) 

jpc 

11 

101 

000000000011 

* 

jpcrs 

11 

111 

000000000011 

movfro 

11 

Oil 

OOOOOOOOOrrr 

rrr  -  special  register 

movtos 

11 

010 

OOOOOOOOOrrr 

rrr  -  special  register 

trap 

11 

110 

OvwwwvOll 

Srcl— 0,  vvvvvvvv— vector 

unused 

11 

001 

A  star  (*)  indicates  an  instruction  that  has  its  Desi  field  in  the  position  where  the  Src2  field  normally  sits.  This  can 
also  be  determined  by  decoding  the  MSB  of  the  type  field  and  the  middle  bit  of  the  OP  field. 
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Appendix  III 

Floating  Point  Instructions 

This  describes  the  floating  point  opcodes  and  formats  of  the  instructions  implemented  in  the  MIPS-X  Instruction 
Level  Simulator  ( milsx ). 

111.1.  Format 

All  floating  point  numbers  are  represented  in  one  32-bit  word  as  shown  in  Fig.  in-1.  The  fields  represent  the 
following  floating  point  number 
(-1)*  x  2exP* 127  x  (1  +  fraction) . 

This  is  an  approximate  IEEE  floating  point  format. 


exp  (8  bits'! 


fraction  (23  bits) 


Figure  HI-1:  Floating  Point  Number  Format 


111.2.  Instruction  Timing 

All  floating  point  instructions  are  assumed  to  take  one  cycle  to  execute.  Mare  realistic  timing  numbers  can  be 
derived  by  multiplying  the  number  output  by  mils  by  an  appropriate  constant. 


111.3.  Load  and  Store  Instructions 

There  are  16  floating  point  registers.  They  are  loaded  and  stored  using  the  Idf  and  stf  instructions  defined  in  the 
instruction  set.  Moves  between  the  floating  point  registers  and  die  main  processor  are  done  using  the  mow/  and  moyfi 
instructions.  These  use  the  movtoc  and  movfrc  formats  defined  in  the  instruction  set  Note  that  only  4  of  the  5  bits  that 
specify  a  floating  point  register  in  the  Utf,  srf,  movif  and  moyfi  instructions  are  used. 


111.4.  Floating  Point  Compute  Instructions 

The  format  of  the  floating  point  compute  instructions  is  the  one  shown  in  the  description  of  the  aluc  coprocessor 
instruction.  The  coprocessor  number  [COP#)  is  0  for  the  floating  point  coprocessor.  The  Func  field  specifies  the 
floating  point  operation  to  be  performed. 
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111.5.  Opcode  Map  of  Floating  Point  Instructions 

In  the  following  table: 

rl,r2  are  cpu  registers  from  r0..r31 

fl,f2  are  floating  point  registers  front  f0..fl5 

n  is  an  integer  expression 


Instruction 

TY 

OP 

fadd 

f  1,  f2 

10 

101 

f  sub 

f  1,  f  2 

10 

101 

fmul 

f  1,  f2 

10 

101 

fdiv 

f  1,  f  2 

10 

101 

cvtif 

f  1,  f  2 

10 

101 

cvtfi 

f  1,  f2 

10 

101 

imul 

f  1,  f  2 

10 

101 

idiv 

fl,f2 

10 

101 

rood 

f  1,  f2 

10 

101 

movif 

rl,fl 

10 

111 

movfi 

fl,rl 

10 

101 

ldf  n[rl],fl 

10 

100 

stf  n[rl],fl 

10 

110 

Func 

Operation 

000000 

f2 

«= 

fl  +  f2 

000001 

f2 

«= 

fl  -  f2 

000010 

f2 

<= 

fl  X  f2 

000011 

f  2 

fl  /  f2 

000100 

f2 

<= 

float (fl) 

000101 

f2 

<= 

int(fl) 

000110 

f  2 

fl  x  f2 

000111 

f2 

<= 

fl  /  f 2 

001000 

f  2 

<= 

fl  mod  f 2 

001001 

fl 

«s 

rl 

001010 

rl 

e= 

fl 

Comments 
Srcl-0,  Dest-0 
Srcl-0,  Dest-0 
Srcl-0,  Dest-0 
Srcl-0,  Dest-0 
Srcl-0,  Dest-0 
Convert  int  to  float 
Srcl-0,  Dest-0 
Convert  float  to  int 
Srcl-0,  Dest-0 
Integer  multiplication 
Srcl-0,  Dest-0 
Integer  division 
Srcl-0,  Dest-0 
Integer  mod 
Srcl-0,  CS1-0 
Srcl-0,  CS2-0 
See  instruction  page 
See  instruction  page 


Floating  Point 


75 


Appendix  IV 

Integer  Multiplication  and  Division 

This  appendix  describes  the  multiplication  and  division  support  on  MIPS-X  The  philosophy  behind  why  the  current 
implementation  was  chosen  is  described  first  and  then  the  instructions  for  doing  multiplication  and  division  are 
described. 


IV.1.  Multiplication  and  Division  Support 

The  goal  of  the  multiplication  and  division  support  in  MIPS-X  is  to  provide  a  reasonable  amount  of  support  with  the 
smallest  amount  of  hardware  possible.  Speed  ups  can  be  obtained  by  realizing  that  most  integer  multiplications  are 
used  to  obtain  a  32-bit  result,  not  a  64-bit  result  The  result  is  usually  the  input  to  another  operation,  or  it  is  the  address 
of  an  array  index.  In  either  case  a  number  larger  than  32  bits  would  not  make  sense.  Since  the  result  is  less  than  32 
bits,  one  of  the  operands  is  most  likely  to  be  less  than  16  bits  or  there  will  be  an  overflow.  In  general  this  means  that 
only  about  16  1-bit  multiplication  or  division  steps  are  required  to  generate  the  final  answer.  For  very  small  constants, 
instructions  can  be  generated  inline  instead  of  using  a  general  multiplication  or  division  routine.  Therefore,  it  was  felt 
that  there  was  no  great  advantage  to  implement  a  scheme  that  could  do  more  than  1  bit  at  a  time  such  as  Booth 
multiplication. 

The  other  advantage  of  only  generating  a  32-bit  result  is  that  it  is  possible  to  do  multiplication  starting  at  the  MSB  of 
the  multiplier  meaning  that  the  same  hardware  can  be  used  for  multiplication  and  division.  The  required  hardware  is  a 
single  register,  the  MD  register,  that  can  shift  left  by  one  bit  each  cycle,  and  an  additional  multiplexer  at  the  source  1 
input  of  the  ALU,  that  selects  the  input  or  two  tunes  the  input  for  the  source  1  operand. 


IV.2.  Multiplication 

Multiplication  is  done  with  die  simple  1-bit  shift  and  add  algorithm  except  that  the  computation  is  started  from  the 
most  significant  bit  instead  of  die  least  significant  bit  of  the  multiplier.  The  instruction  that  implements  one  step  of  the 
algorithm  is  called  mstep.  For 
mstep  rSrc  1  jSrc2^Dest 
the  operation  is: 

If  the  MSB  of  the  MD  register  is  1 
then 

iDest  <=  2  x  rSrcl  +  rSrc2 
else 

rDest  «=  2  xrSrcl 
Shift  left  MD 


For  signed  muldplicadon,  the  first  step  is  different  from  the  rest  If  the  MSB  of  the  multiplier  is  1,  the  multiplicand 
should  be  subtracted  from  0.  The  instruction  called  mstart  is  provided  for  this  purpose.  For 
ms  tart  rSrc2jDest 
the  operation  is 
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If  the  MSB  of  the  MD  register  is  1 
then 

xDeste=0-iSic2 

else 

rDest  «=  0 
Shift  left  MD 

To  show  die  simplest  implementation  of  a  multiplication  routine  assume  that  die  following  registers  have  been 
assigned  and  loaded 

rMer  is  the  multiplier, 
rMand  is  die  multiplicand, 
rDest  is  the  result  register 
rUnk  is  the  jump  linkage  register. 

Then, 

aovtoa  rMar, rMD  /Move  th*  multiplier  into  MD 

nop  ;Moodod  for  hardware  timing  raaaona — a**  movto* 

matart  rMand, rDaat  (Do  tha  firat  aatop.  Raault  go* a  into  rDoat 

matap  rDaat, rMand, rDaat  » Rapaat  31  timaa 

Japci  rLink, <0,r0  ; Rat urn 

It  is  possible  to  speed  up  the  routine  by  using  the  assumption  described  previously  that  die  numbers  will  not  both  be 
a  full  32  bits  long.  The  simplest  scheme  is  to  check  to  see  if  the  multiplier  is  less  than  8  bits  long.  Some  statistics 
indicate  that  this  occurs  frequently. 

The  routine  shown  in  Figure  IV-1  implements  multiplication  with  less  than  32  msteps  on  average.  It  will  actually  do 
a  full  32  msteps  if  it  is  necessary.  In  this  case  it  is  most  likely  that  overflow  will  occur  and  this  can  be  detected  if  the  V 
bit  in  die  PSW  is  clear  so  that  a  trap  on  overflow  will  occur.  Assume  that  the  registers  rMer,  rMand  and  rDest  have 
been  assigned  and  loaded  as  in  the  previous  example.  Two  temporary  registers,  rTempl  and  rTemp2  are  also  required. 

The  number  of  cycles  required,  not  including  the  instructions  needed  for  die  call  sequence  is  shown  in  Table  IV-1. 
Compare  this  with  die  simple  routine  using  just  32  steps  which  requires  35  instructions  to  do  die  multiplication  and  a 
Booth  2-bit  algorithm  that  will  need  about  19  instructions.  It  can  be  observed  that  if  most  multiplications  require  8  or 
less  msteps,  then  this  routine  will  be  faster  than  just  doing  32  msteps  all  the  time. 

IV.3.  Division 

For  division,  the  same  set  of  hardware  is  used,  except  the  ALU  is  controlled  differently.  The  algorithm  is  a  restoring 
division  algorithm.  Both  of  the  operands  must  be  positive  numbers.  Signed  division  is  not  supported  as  it  is  too  hard  to 
do  for  the  hardware  required  [2]. 

The  dividend  is  loaded  in  the  MD  register  and  the  register  that  will  contain  the  remainder  (rRem)  is  initialized  to  0. 
The  divisor  is  loaded  into  another  register  called  ( rDor ).  The  result  of  the  division  (quotient)  will  be  in  MD.  For 

dMp  riUmjDar  jftem 
Om  operation  is: 
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nnnnnnnnnnnnnnnnnnninnnnnnnnnnnnnnnn 

HOt 

faat,  unchackad,  aignad  Multiply 
rtink  ■  link 
rMand  -  arc2 
rDaat  -  rHar  -  arcl/daat 
rTampl  ”  trap 
rTamp2  -  tamp 


Nota:  Thia  coda  ha a  bran  raorganiiad 


nnnnnnnnnnnnnnnnnnnnnnnnnnnnintinnnnnn 

MUL: 

aar 

rMar, rTamp2, 17 

1  Taat  for  poaitlva  8-blt  numbar 

bna 

rTamp2, rO, InotS 

ah 

rO, rMar, rTampl, #24 

1  aaauma  8  bit 

movtoa 

rTampl, ad 

matart 

rMand, rDaat 

1  may  naad  nop  bafora  thia 

matap 

rDaat , rMand, rDaat 

lmul8bit : 

■atap 

rDaat, rMand, rDaat 

autap 

rDaat, rMand, rDaat 

matap 

rDaat, rMand, rDaat 

matap 

rDaat, rMand, rDaat 

Japci 

rtink, 10, rO 

matap 

rDaat, rMand, rDaat 

aiatap 

rDaat, rMand,  rDaat 

lnot8 : 

addi 

rTamp2, 11, rTamp2 

baqaq 

rTamp2.rO, lmulBbit 

;  8  bit  nagativa 

matart 

rMand, rDaat 

matap 

rDaat, rMand,  rDaat 

movtoa 

rDaat,  ad 

;  do  full  32  bita 

matart 

rMand, rDaat 

1  may  naad  nop  bafora  thia 

matap 

rDaat, rMand, rDaat 

autap 

rDaat, rMand, rDaat 

aiatap 

rDaat . rMand, rDaat 

aiatap 

rDaat, rMand, rDaat 

24  mataps 


uiatap  rDaat,  rMand,  rDaat 
japci  rtlnk,40,r0 
matap  rDaat, rMand, rDaat 
matap  rDaat,  rMand, rDaat 


Number  of  msteps  needed 

8 

32 

Number  of  cycles  with  positive  multiplier 

13 

42 

Number  of  cycles  with  negative  multiplier 

15 

42 

Table  IV-1:  Number  of  Cycles  Needed  to  do  a  Multiplication 

Set  ALUsrcl  input  to  2  x  rRem  +  MSB(rMD) 

Set  ALUsrc2  input  to  rDor 
ALUoutput  «=  ALUsrcl  -  ALUsrc2 

If  MSB(  ALUoutput)  is  1 
then 

rRem  <=  ALUsrcl 
rMD  <=  2  x  rMD 
else 

rRem  <=  ALUoutput 
rMD  «b  2  x  rMD  +  1 

At  the  end  of  32  dsteps  the  quotient  will  be  in  the  MD  register,  and  the  remainder  is  in  rRem. 

A  routine  for  doing  division  is  shown  in  Figure  IV-2.  The  dividend  is  passed  in  rDend  and  the  divisor  in  rDor.  At 
the  end,  die  quotient  is  in  MD  and  rQuot  and  the  remainder  is  in  rRem.  Note  that  rDend  and  rRem  can  be  die  same 
register,  and  rDor  and  rQuot  can  be  the  tame  register.  The  dividend  and  divisor  are  checked  to  make  sure  they  are 
positive.  This  routine  does  a  32-bit  by  32-bit  division  so  no  overflow  can  occur. 

The  number  of  cycles  needed,  not  including  the  calling  sequence  and  assuming  the  operands  are  positive,  is  shown  in 
Table  IV-2. 


Number  of  dsteps  needed  8  32 

Number  of  cycles  needed  34  60 

Table  IV-2:  Number  of  Cycles  Needed  to  do  a  Divide 
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DIV 

fas t,  unchecked,  signed  divide  (should  check  for  saro  divida)  ; 

rLink  -  link  ; 

rDand, rRam  -  srcl  (dividend)  ; 

rOor  -  rQuot  »  src2/dast  (divisor /quotient)  ; 

rTempl  -  tanp  (Craahad)  ; 

rTamp2  «  Camp  (trashed)  ; 

Note:  This  eoda  hat  baan  reorganized  j 


DIV: 


MOV 

rDand, rTemp2 

b?« 

rDand, rO, lcinitl 

nop 

nop 

sub 

rO, rDand, rDand 

lcinitl: 

bgesq 

rDor,rO,lcinit2 

add! 

rO, tOxff , r Tempi 

nop 

sub 

rO, r?emp2, rTemp2 

sub 

rO,rDor,rDor 

addi 

rO, «0xff, rTempl 

lcinit2: 

bltsq 

rTempl, rDand, ldivfull 

movtos 

rDand,  md 

BOV 

rO,rRem 

sh 

rO, rDand, rDand, 88 

movtos 

rDand, md 

beq 

rO,rO, ldlvloop 

mov 

rO,rRem 

addi 

rO, #8, rTempl 

ldivfull: 

addi 

rO, 832, rTempl 

ldlvloop: 

dstep 

rRam, rDor, rRam 

dstep 

rRam, rDor, rRam 

ldivloopr: 

dstep 

rRam, rDor, rRam 

dstep 

rRam, rDor, rRam 

dstep 

rRam, rDor, rRam 

dstep 

rRam, rDor, rRam 

dstep 

rRam, rDor, rRam 

addi 

rTempl, 8-8, rTempl 

dstep 

rRam, rDor , rRam 

bnesq 

rTempl, rO, ldivloopr 

dstep 

rRam, rDor, rRam 

dstep 

rRam, rDor, rRam 

movfra 

md, rQuot 

bge 

rTamp2, rO, lcinlt3 

nop 

nop 

sub 

rO, rQuot, rQuot 

lcinit3: 

jspci 

rLink, 80, rLink 

nop 

nop 

nnminmnniuinmtnnn 

;  dlvldand  >  0  ? 

I  make  dlvldand  >  0 

;  divisor  >  0  ? 

;  cheek  for  8-blt  dividend 

;  rTemp2  >  0  if  positive  result 
;  make  divisor  >  0 

;  do  8-bit  chack 
;  start  32-bit  divida 

;  shift  up  divisor  to  do  8  bits 
;  start  8-bit  divide 

;  loop  oounter 
;  do  full  32  dsteps 


l  decrement  loop  counter 

;  get  result 

;  chack  if  need  to  adjust  sign  of  result 

;  adjust  sign  of  result 
;  return 


Figure  FV«2:  Signed  Integer  Division 
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Appendix  V 

Multiprecision  Arithmetic 

Multiprecision  arithmetic  is  not  a  high  priority  but  It  is  desirable  to  make  it  possible  to  do.  The  minimal  support 
necessary  will  be  provided.  The  most  straightforward  way  to  do  this  would  seem  to  be  die  addition  of  a  cany  bit  to  the 
PSW.  However,  this  turns  out  to  be  extremely  difficult 

The  following  program  segments  are  examples  of  doing  double  precision  addition  and  subtraction.  The  only 
addition  required  to  the  instruction  set  is  the  Subtract  with  No  Carry  ( subnc )  instruction.  This  is  cmly  an  addition  to  the 
assembly  language  and  not  to  the  hardware. 


Assume  that  there  are  2  double  precision  operands  (A  and  B)  and  a  double  precision  result  to  be  computed  (C). 
Assume  that  die  necessary  registers  have  been  loaded. 

; Double  precision  addition 


add 

rAhi, rBhi, rChi 

sub 

rO.rBlo.rClo 

bhssq 

rAlo,rClo,ll 

addi 

rChi, tl.rChi 

nop 

add 

rAlo, rBlo,  rClo 

/add  high  words 

;gst  -rBlo;  brsneb  does  subtract 
/check  to  soe  if  carry  genaratad 
/branch  if  carry  aat 
/add  1  to  high  word  if  carry 

/add  .  words 


/ Double  precision  subtraction 


subnc 

bhsaq 


addi 

nop 

11:  sub 


rAhi, rBhi, rChi 
rAlo, rBlo,  11 

rChi,tl,rChi 
rAlo, rBlo,Clo 


/subtract  high  words 
/chock  if  aubtract  of  low 
/word#  generates  s  carry 
/branch  if  carry  sat 
/ add  1  to  high  word  if  carry 

/subtract  low  words 
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Appendix  VI 
Exception  Handling 

An  exception  is  defined  is  either  m  event  thit  cmscs  an  intenupt  or  i  trap  instruction  that  can  be  thought  of  as  a 
software  interrupt  The  two  sequences  cause  similar  actions  in  the  processor  hardware.  Because  there  is  a  branch  delay 
of  2,  three  PCs  from  the  PC  chain  must  be  saved  and  restarted  on  an  interrupt.  Three  PCs  are  needed  in  the  event  that  a 
branch  has  occurred  and  fallen  off  the  end  of  the  chain.  The  two  branch  slot  instructions  and  the  branch  destination  are 
saved  for  restarting.  Restarting  a  trap  is  slightly  different  and  is  explained  later.  See  Section  2.4  for  a  description  of  the 
PSW  during  interrupts,  exceptions,  and  traps. 


VI.1.  Interrupts 

Interrupts  are  asynchronous  events  that  the  programmer  has  no  control  over.  Because  there  are  several  instructions 
executing  at  the  same  time,  it  is  necessary  to  save  the  PCs  of  all  the  instructions  currently  executing  so  drat  die  machine 
can  be  properly  restarted  after  an  interrupt.  The  PCs  are  held  in  the  PC  chain.  When  an  interrupt  occurs,  the  PC  chain 
is  frozen  (stops  shifting  in  new  values)  to  allow  the  intenupt  routine  to  save  the  PCs  of  the  three  instructions  that  need 
to  be  restarted.  These  are  the  PCs  of  the  instructions  that  ere  in  the  RF,  ALU  and  MEM  cycles  of  execution.  This 
means  that  no  further  exceptions  can  occur  while  die  PCs  are  being  saved.  When  the  interrupt  sequence  begins,  the 
interrupts  are  disabled,  PSWcurrent  is  copied  into  PSWother  and  the  machine  begins  execution  in  system  state.  The 
contents  of  PSWother  should  be  saved  if  interrupts  are  to  be  enabled  before  the  return  from  the  interrupt  The  contents 
of  the  MD  register  must  also  be  saved  and  restored  if  any  multiplication  or  division  is  done.  If  die  interrupt  routine  is 
very  short  and  interrupts  can  be  left  off,  it  is  possible  to  just  leave  the  PC  chain  frozen,  otherwise  the  three  PCs  must  be 
saved.  To  save  the  PCs  use  movfrs  with  PC-4  as  the  source.  The  PC  chain  shifts  after  each  read  of  PC-4. 

The  interrupt  routine  will  start  execution  at  location  0.  It  must  look  at  a  register  in  the  intenupt  controller  to 
determine  how  to  handle  the  interrupt.  This  sequence  is  yet  to  be  specified. 

To  return  from  an  interrupt,  interrupts  must  fust  be  disabled  to  allow  the  state  of  the  machine  to  be  restored..  The 
PSW  must  be  restored  and  the  PC  chain  loaded  with  the  return  addresses.  The  PC  chain  is  loaded  by  writing  to  PC-1 
and  it  shifts  after  each  write  to  PC-1.  The  instructions  are  restarted  by  doing  three  jumps  to  the  address  in  PC-4  and 
having  shifting  of  the  PC  chain  enabled.  This  means  that  the  addresses  will  come  out  of  the  end  of  the  chain  and  be 
reloaded  at  the  front  in  the  desired  order. 

Hie  first  of  the  three  jumps  should  be  a  jpers  instruction.  It  will  cause  PSWother  to  be  copied  to  PSWcurrent  with 
the  interrupts  turned  on  and  the  state  returned  to  user  space.  The  machine  state  changes  after  die  ALU  cycle  of  the  first 
jump.  The  last  two  instructions  of  the  return  jump  sequence  should  be  jpe  instructions. 

A  problem  arises  because  an  exception  could  occur  while  restarting  these  instructions.  The  PC  chain  is  now  in  a 
state  that  it  is  not  possible  to  restart  the  sequence  again  using  the  standard  sequence  of  first  saving  the  PC  chain.  The 
start  of  an  exception  sequence  should  first  check  the  e  bit  in  the  PSW  to  see  whether  it  is  cleared.  The  e  bit  will  be  set 
only  when  the  PC  chain  to  back  in  a  normal  state.  If  it  to  clear,  then  the  state  of  the  machine  should  not  be  resaved.  The 
state  to  use  for  restart  should  still  be  available  in  the  process  descriptor  for  the  process  being  restarted  when  the 
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lr«t:  Inst  s  (Instructions  *,b  and  c  srs  restarted 

inst  b 

inst  c 

—  interrupt  — 
inst  d 

inst  e 


inthlr:  brs  to  save  if  e  bit  set 
Do  necessary  fixes 
bra  nosave 

save :  Save  PSHother 

Save  MD 

novfrs  pcm4,rA 
•ovfrs  pcm4,rB 
novfrs  pcra4, rC 
nosave:  Enable  Interrupts 


Process  interrupts 


Disable  interrupts 
Restore  MD 
Restore  PSHother 
aovtoa  rA.pcml 
movtos  rB,  pcrnl 
novtos  rC.  pool 
Jpcrs 
iPC 
jpe 

execution  begins  at  label  Iret 


; Start  of  interrupt  handler 

(•  bit  clear  so  don't  save  PC  chain 

(do  save  if  interrupts  to  be  enabled 

(if  necessary 

(save  PCs  if  necessary 


(if  necessary  and  above  saving  done 


(if  necessary 
(if  necessary 
(restore  PCs 


(This  changes  the  PSM  as  well 
(Doesn't  touch  PSD 


Figure  VI-1:  Intemipt  Sequence 

exception  occurred.  The  sequence  for  interrupt  handling  is  shown  in  Figure  VI-1. 

i 

VI.2.  Trap  On  Overflow 

A  trap  on  overflow  (See  Section  2.4.1)  behaves  exactly  like  an  interrupt  except  that  it  is  generated  on-chip  instead  of 
externally.  This  interrupt  can  be  masked  by  setting  the  V  bit  in  the  PSW. 

When  a  trap  on  overflow  occurs,  the  0  bit  is  set  in  the  PSW.  The  exception  handling  routine  must  check  this  bit  to 
see  if  an  overflow  is  the  cause  of  die  exception. 

VI.3.  Trap  Instructions 

Besides  die  Trap  on  Overflow,  there  is  only  one  other  type  of  trap  available.  It  is  an  unconditional  vectored  trap  to  a 
system  space  routine  in  low  order  memory.  After  the  ALU  cycle  of  die  trap  instruction  the  processor  goes  into  system 
state  with  the  PC  chain  frozen.  The  instruction  before  the  trap  instruction  will  complete  its  WB  cycle.  The  PSW  is 
saved  by  copying  PSWcunent  to  PSWother  as  described  in  Section  2.4.  PSWcunent  is  loaded  as  if  this  were  an 
interrupt 
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Before  interrupts  can  be  turned  on  again,  tome  processor  state  must  be  saved.  The  return  PCs  are  currently  in  the  PC 
chain.  Three  PCs  must  be  read  from  the  PC  chain  and  the  third  one  saved  in  the  process  descriptor.  It  is  the  instruction 
that  is  in  the  RF  cycle.  The  instruction  corresponding  to  the  PC  in  MEM  completes  so  it  need  not  be  restarted.  The  PC 
in  die  ALU  cycle  should  not  be  restarted  because  it  is  the  trap  instruction.  PSWother  mutt  be  saved  so  that  the  state  of 
the  prior  process  is  preserved.  If  PSWother  is  not  saved  before  interrupts  are  enabled,  then  another  interrupt  will  smash 
the  PSW  of  the  process  that  executed  the  trap  before  it  can  be  saved. 

All  trap  instructions  have  an  8-bit  vector  number  attached  to  them.  This  provides  256  legal  trap  addresses  in  system 
space.  These  addresses  are  8  locations  apart  to  provide  enough  space  to  store  some  jump  instructions  to  the  correct 
handler.  If  this  is  not  enough  vectors,  one  of  the  traps  can  take  a  register  as  an  argument  to  determine  the  action 
required. 

The  return  sequence  must  disable  interrupts,  restore  the  contents  of  PSWother  and  MD  if  they  were  saved  and  then 
disable  PC  shifting  so  that  the  return  address  can  be  shifted  into  the  PC  chain.  Two  more  addresses  must  be  shifted  in 
as  well  so  that  the  restart  will  look  die  same  as  an  interrupt  This  can  be  done  by  loading  die  addresses  of  two  nop 
instructions  into  the  PC  chain  ahead  of  the  return  address.  Three  jumps  to  the  addresses  in  the  PC  chain  are  then 
executed  using  jpcrs  and  two  ./pcs.  The  first  jump  will  copy  the  contents  of  PSWother  into  PSWcurrcnt  and  turn  on  PC 
shifting.  The  processor  state  changes  after  the  ALU  cycle  of  the  Jpcrs.  The  change  of  state  also  enables  interrupts  and 
puts  the  processor  in  user  space. 

If  an  interrupt  occurs  during  die  return  sequence  then  the  interrupt  handler  will  look  at  the  e  bit  in  the  PSW  to 
determine  whether  the  state  should  be  saved. 

The  flow  of  code  for  taking  a  trap  and  returning  is  shown  in  Figure  VI-2. 
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lret : 


veenum: 


trap  veenum 


movfra  pcm4,r0 
movfra  pcm4,r0 
■ovfra  pca4,r31 
Sava  PSNothar 
Sava  MD 

Enabla  intarrupta 


Proceaa  requested  trap 


Dlsabla  intarrupta 
Raatora  MD 
Restore  PSNothar 
movtoa  rO.pcml 
movtoa  rO.pcml 
movtoa  r31, pcml 
jpcra 
jpc 

jPC 

execution  bagina  at  labal  Xrat 


;inat ruction  bafora  trap 
;trap  inatruction 
/save  tbia  ena  to  raatart 
sit  nacaaaary 
;if  nacaaaary 

;if  nacaaaary  and  above  aaving  done 


; movtoa  x.pawc  where  x  baa  M  bit  eat 
Sit  nacaaaary 
;lf  nacaaaary 
/aaaume  a  nop  at  0 
• 

/inatruction  after  trap 


Figure  VI-2:  Trap  Sequence 
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Appendix  VII 

Assembler  Macros  and  Directives 

This  appendix1  describes  the  macros  and  directives  used  by  die  MIPS-X  assembler.  Also  provided  is  a  full  grammar 
of  the  assembler  for  those  that  need  more  detail. 

VII.1.  Macros 

Several  macros  are  provided  to  ease  the  process  of  writing  assembly  code.  These  allow  low  level  details  to  be 
hidden,  and  ease  the  generation  of  code  for  both  compilers  and  assembly  language  programmers. 

VII.1 .1.  Branches 

bgt,  ble  The  assembler  synthesizes  these  instructions  by  reversing  the  operands  and  using  a  Mr  or  a  bge 

instruction. 

VII.1 .2.  Shifts 

1st,  1st  These  instructions  are  synthesized  from  the  sh  instruction.  For  example: 

lar  rl,r2,M 

shifts  rl  four  bits  right  and  puts  the  result  in  r2. 

VII.1 .3.  Procedure  Call  and  Return 

pjsr  subroutine, #expl,reg2  A  simple  procedure  call.  The  stack  pointer  is  decremented  by  expl.  The  return  address  is 

stood  on  the  stack.  On  return,  the  stack  pointer  is  restored.  Reg2  is  used  as  a  temporary. 
No  registers  are  saved. 

ipjsr  regl,#expljeg2 

ipjsr  exp2,reg  1  ,#exp  1  ,reg2  A  call  to  a  subroutine  determined  at  run  time.  The  particular  subroutine  address  must  be 

in  a  register  (regl)  or  be  addressable  off  a  register  (exp2  +  regl).  The  stack  pointer  and 
the  return  address  handling  is  identical  to  pjsr.  Reg2  is  used  as  a  temporary. 

ret  Jump  to  the  return  address  stored  by  a  pjsr  or  ipjsr  macro. 

VII.2.  Directives 

.text  Signals  the  beginning  or  resumption  of  the  text  segment.  This  allows  code  to  be  grouped  into  one 

area.  Labels  in  the  text  segment  have  word  values. 

.data  Signals  the  beginning  or  resumption  of  die  data  segment  Labels  in  the  data  segment  have  byte 

values.  Ordering  within  the  data  segment  is  not  changed. 

.end  Signals  the  end  of  the  module. 

.eop  Signals  the  end  of  a  procedure.  No  branches  are  allowed  to  cross  procedure  boundaries.  This 

directive  was  added  to  reduce  die  memory  requirements  of  the  assembler.  Reorganization  can  be 
done  by  procedure  instead  of  by  module. 

.ascii  "xxx"  Allows  a  string  literal  to  be  put  in  the  data  segment 
.word  exp  Initializes  a  wend  of  memory. 

'Provided  by  Scon  McFvtiag 
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.float  number  Initializes  a  floating  point  literal 

id  -  exp  Sets  an  assembly-tune  constant  This  allows  a  code  generator  to  emit  code  before  the  value  of 

certain  offsets  and  literals  are  known.  The  assembler  will  resolve  expressions  using  this  identifier 
for  aliasing  calculations  etc. 

.defid -exp  Sets  a  link-time  constant  The  identifier  will  be  global. 

.noreorg  Allows  reorganization  to  be  turned  off  in  local  areas. 

jeorgon  Turns  reorganization  back  on. 

.comm  id,n  Defines  a  labeled  common  area  of  n  words.  Common  area  names  are  always  global. 

.globl  id  Makes  an  identifier  global  or  accessible  outside  the  module.  The  .globl  statement  must  appear 

before  the  id  is  otherwise  used.  All  procedure  entry  points  should  be  made  global,  otherwise  the 
code  may  be  removed  as  dead. 

Jitrl,r2,... 

Jif  r5,rl0,...  Give  a  list  of  registers  that  are  live  for  the  following  branches.  Jit  is  for  registers  live  if  the  branch 
is  taken  and  Jif  is  for  registers  live  if  the  branch  is  not  taken.  Liveness  information  is  used  for 
interblock  reorganization  and  branch  scheduling. 


VII.3.  Example 

;program  1+1  «  2? 

.data 

labell: 

.word  1 
.text 

.globl  _main 
main: 


Id 

laboll [rO] , rl 

addl 

rl.ll.rl 

addl 

r0,l2,r2 

•rror: 

bne 

ret 

rl, r2, error 

•«nd 

trap 

rot 

1 

VII.4.  Grammar 

file 
line 

statement 


file  line 
\n 

COMMENT  \n  {  comment  «  ; .  *  } 

statement  COMMENT  \n 
statement  \n 
label 

binALUState 

monALUState 

specState 

nopState 

addiState 

jspciState 

shiftState 

loadState 

storeState 

branchState 

copState 

miscState 

directState 
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label 

binALUState 

binALUOp 


monALUState 

rnonOp 

specState 

specialReg 

nopState 

addiState 

jspclState 

shiftState 

loadState 

storeState 

branchState 

branchOp 


[  branchSqOp 


copState 


I  macroState 

s  ID  :  {ID  must  be  in  column  1  ) 

:  binALUOp  reg, reg, reg 
:  ADD 
I  SUB 
I  AND 
I  OR 
I  XOR 
1  ROTLB 
I  ROTLCB 
I  MSTEP 
I  DSTEP 
I  SUBNC 
I  B1C 

:  monOp  reg, reg 
|  MSTART  reg, reg 
:  NOT 
I  MOV 

:  MOVTOS  reg, specialReg 
|  MOVFRS  specialReg, reg 
:  MD 
i  PSH 
I  PCM4 
I  PCM1 
:  NOP 

:  ADDI  reg,#exp,reg 
:  JSPCI  reg, #exp,reg 
:  ASR  reg, reg, #exp 
|  SH  reg,reg,reg,#exp 
I  LSR  reg, reg, #exp 
I  LSL  reg, reg, #exp 
:  LD  exp [reg], reg 
I  LD  #exp,reg 

(  adds  constant  to  literal  pool  and  loads  it  } 
I  LDT  exp [reg], reg 
I  LDF  exp [reg], f reg 
:  ST  exp [reg], reg 
I  STT  exp [reg], reg 
I  STF  exp [reg] ,f reg 
:  branchOp  reg, reg, ID 
I  branchSqOp  reg, reg, ID 
I  BRA  ID 
:  BEQ 
I  BNE 
I  BGE 
I  BGT 
I  BKI 
I  BKS 
I  BLE 
I  BLO 
I  BLS 
I  BLT 
:  BEQSQ 
I  BNESQ 
I  BGESQ 
I  BGTSQ 
I  BHISQ 
I  BHSSQ 
I  BLESQ 
I  BLOSQ 
I  BLS  SO 
I  BLT SO 

:  MOVTOC  exp, reg 
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floatBinOp 


floatMonOp 

miscState 


directState 


liveLiat 


MOVFRC  exp, rag 
ALOC  exp 

floatBinOp  frag, frag 

floatMonOp  frag, frag 

MOVZF  rag, frag 

MOVFI  frag, rag 

FADD 

FSUB 

FMUL 

FD1V 

IMUL 

ID  IV 

MOD 

CVTIF 

CVTFI 

TRAP  exp 

JPC 

JPCRS 

TEXT 

DATA 

END 

EOF 

ASCII  STRING  {  string:  ».*•  ) 
WORD  axp 

FLOAT  FLOATCONSTANT 

ID  -  axp 

DEF  ID  -  axp 

REORGON 

NOREORG 

COMM  ID, INT 

GLOBL  ID 

LIT  livaList 

LIF  livaList 

rag 

livaList, rag 


macroState 


axp 


addOp 


PJSR  ID, #exp, rag 
IPJSR  rag, #exp, rag 
XPJSR  exp,reg,#exp,reg 
RET 

axp  addOp  tarn 
-  factor 
tarn 
+ 


term 

multOp 

factor 


rag 

frag 


tarn  multQP  factor 
factor 
* 

(  axp  ) 

ID 

INT 

HEXXNT  {  like  C:  0xl2fe  ) 

REG  {  r0..r31  ) 

FREG  {  f0..fl5  ) 


notes : 

1)  only  labels  and  directives  may  start  in  column  1 

2)  Keywords  xre  shown  in  upper  case  just  to  make  them 
stand  out.  In  reality,  they  MUST  be  lower  case. 

3)  directives  begin  with  a 


I 

i 

I 

I 

I 
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